summaryrefslogtreecommitdiff
path: root/rts
Commit message (Collapse)AuthorAgeFilesLines
* Remove unused closure stg_IND_directÖmer Sinan Ağacan2020-04-021-9/+0
|
* Fix a pointer format string in RTSÖmer Sinan Ağacan2020-04-021-1/+1
|
* Update sanity checking for TSOs:Ömer Sinan Ağacan2020-03-161-36/+22
| | | | | | | | | | | | | | | | - Remove an invalid assumption about GC checking what_next field. The GC doesn't care about what_next at all, if a TSO is reachable then all its pointers are followed (other than global_tso, which is only followed by compacting GC). - Remove checkSTACK in checkTSO: TSO stacks will be visited in checkHeapChain, or checkLargeObjects etc. - Add an assertion in checkTSO to check that the global_link field is sane. - Did some refactor to remove forward decls in checkGlobalTSOList and added braces around single-statement if statements.
* Fix global_link of TSOs for threads reachable via dead weaksÖmer Sinan Ağacan2020-03-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #17785 Here's how the problem occurs: - In generation 0 we have a TSO that is finished (i.e. it has no more work to do or it is killed). - The TSO only becomes reachable after collectDeadWeakPtrs(). - After collectDeadWeakPtrs() we switch to WeakDone phase where we don't move TSOs to different lists anymore (like the next gen's thread list or the resurrected_threads list). - So the TSO will never be moved to a generation's thread list, but it will be promoted to generation 1. - Generation 1 collected via mark-compact, and because the TSO is reachable it is marked, and its `global_link` field, which is bogus at this point (because the TSO is not in a list), will be threaded. - Chaos ensues. In other words, when these conditions hold: - A TSO is reachable only after collectDeadWeakPtrs() - It's finished (what_next is ThreadComplete or ThreadKilled) - It's retained by mark-compact collector (moving collector doesn't evacuate the global_list field) We end up doing random mutations on the heap because the TSO's global_list field is not valid, but it still looks like a heap pointer so we thread it during compacting GC. The fix is simple: when we traverse old_threads lists to resurrect unreachable threads the threads that won't be resurrected currently stays on the old_threads lists. Those threads will never be visited again by MarkWeak so we now reset the global_list fields. This way compacting GC does not thread pointers to nowhere. Testing ------- The reproducer in #17785 is quite large and hard to build, because of the dependencies, so I'm not adding a regression test. In my testing the reproducer would take a less than 5 seconds to run, and once in every ~5 runs would fail with a segfault or an assertion error. In other cases it also fails with a test failure. Because the tests never fail with the bug fix, assuming the code is correct, this also means that this bug can sometimes lead to incorrect runtime results. After the fix I was able to run the reproducer repeatedly for about an hour, with no runtime crashes or test failures. To run the reproducer clone the git repo: $ git clone https://github.com/osa1/streamly --branch ghc-segfault Then clone primitive and atomic-primops from their git repos and point to the clones in cabal.project.local. The project should then be buildable using GHC HEAD. Run the executable `properties` with `+RTS -c -DZ`. In addition to the reproducer above I run the test suite using: $ make slowtest EXTRA_HC_OPTS="-debug -with-rtsopts=-DS \ -with-rtsopts=-c +RTS -c -RTS" SKIPWAY='nonmoving nonmoving_thr' This enables compacting GC always in both GHC when building the test programs and when running the test programs, and also enables sanity checking when running the test programs. These set of flags are not compatible for all tests so there are some failures, but I got the same set of failures with this patch compared to GHC HEAD.
* rts: Fix printClosure when printing fwd ptrsÖmer Sinan Ağacan2020-03-141-1/+1
|
* nonmoving: Remove redundant bitmap clearingBen Gamari2020-03-141-2/+4
| | | | | nonmovingSweep already clears the bitmap in the sweep loop. There is no reason to do so a second time.
* nonmoving: Don't traverse filled segment list in pauseBen Gamari2020-03-142-18/+26
| | | | | | | | | The non-moving collector would previously walk the entire filled segment list during the preparatory pause. However, this is far more work than is strictly necessary. We can rather get away with merely collecting the allocators' filled segment list heads and process the lists themselves during the concurrent phase. This can significantly reduce the maximum gen1 GC pause time in programs with high rates of long-lived allocations.
* Zero any slop after compaction in compacting GCÖmer Sinan Ağacan2020-03-111-4/+21
| | | | | | | | | | | | | In copying GC, with the relevant debug flags enabled, we release the old blocks after a GC, and the block allocator zeroes the space before releasing a block. This effectively zeros the old heap. In compacting GC we reuse the blocks and previously we didn't zero the unused space in a compacting generation after compaction. With this patch we zero the slop between the free pointer and the end of the block when we're done with compaction and when switching to a new block (because the current block doesn't have enough space for the next object we're shifting).
* rts: Prefer darwin-specific getCurrentThreadCPUTimeBen Gamari2020-03-111-15/+25
| | | | | | | macOS Catalina now supports a non-POSIX-compliant version of clock_gettime which cannot use the clock_gettime codepath. Fixes #17906.
* rts: refactor and comment profile localesJean-Baptiste Mazon2020-03-091-15/+52
|
* rts: ensure C numerics in heap profiles using Windows locales if neededJean-Baptiste Mazon2020-03-091-19/+32
|
* Fix Windows breakage by not touching locales on WindowsJean-Baptiste Mazon2020-03-091-0/+19
|
* nonmoving: Fix collection of sparksBen Gamari2020-03-094-6/+53
| | | | | | | | | | Previously sparks living in the non-moving heap would be promptly GC'd by the minor collector since pruneSparkQueue uses the BF_EVACUATED flag, which non-moving heap blocks do not have set. Fix this by implementing proper support in pruneSparkQueue for determining reachability in the non-moving heap. The story is told in Note [Spark management in the nonmoving heap].
* rts: Report nonmoving collector statistics in machine-readable outputBen Gamari2020-03-051-0/+20
|
* Stats: Add sync pauses to +RTS -S outputBen Gamari2020-03-051-0/+3
|
* nonmoving-gc: Track time usage of nonmoving markingBen Gamari2020-03-057-24/+212
|
* rts: Add getCurrentThreadCPUTime helperBen Gamari2020-03-053-21/+71
|
* Be explicit about how stack usage of mvar primops are covered.Andreas Klebinger2020-03-041-0/+1
| | | | | | This fixes #17893 [skip-ci]
* rts: enforce POSIX numeric locale for heap profilesJean-Baptiste Mazon2020-02-291-0/+30
|
* Document and refactor a few things around bitmap scavengingÖmer Sinan Ağacan2020-02-291-0/+4
| | | | | | | | - Added a few comments in StgPAP - Added a few comments and assertions in scavenge_small_bitmap and walk_large_bitmap - Did tiny refactor in GHC.Data.Bitmap: added some comments, deleted dead code, used PlatformWordSize type.
* nonmoving: Fix marking in compact regionsBen Gamari2020-02-281-11/+13
| | | | | | | | | Previously we were tracing the object we were asked to mark, even if it lives in a compact region. However, there is no need to do this; we need only to mark the region itself as live. I have seen a segfault due to this due to the concurrent mark seeing a an object in the process of being compacted by the mutator.
* Modules: Core (#13009)Sylvain Henry2020-02-261-1/+1
| | | | Update haddock submodule
* check for safe arguments, raising error when invalid (fix #17720)Stefan Pavikevik2020-02-241-3/+53
|
* rts: Add more debug output to failed path in onIOCompleteBen Gamari2020-02-141-2/+3
| | | | This will help track down #17035.
* Module hierarchy: ByteCode and Runtime (cf #13009)Sylvain Henry2020-02-122-2/+2
| | | | Update haddock submodule
* Add arithmetic exception primops (#14664)Sylvain Henry2020-02-115-0/+34
|
* rts: Remove incorrect assertions around MSG_THROWTO messagesBen Gamari2020-02-111-3/+0
| | | | | | | | | | Previously we would assert that threads which are sending a `MSG_THROWTO` message must have their blocking status be blocked on the message. In the usual case of a thread throwing to another thread this is guaranteed by `stg_killThreadzh`. However, `throwToSelf`, used by the GC to kill threads which ran out of heap, failed to guarantee this. Noted while debugging #17785.
* rts: Use nanosleep instead of usleepBen Gamari2020-02-113-2/+22
| | | | usleep was removed in POSIX.1-2008.
* rts: Fix need_prealloc being reset when retainer profiling is onDaniel Gröber2020-02-081-1/+1
|
* rts: Fix Arena blocks accounting for MBlock sized allocationsDaniel Gröber2020-02-081-1/+1
| | | | | | | | When requesting more than BLOCKS_PER_MBLOCK blocks allocGroup can return a different number of blocks than requested. Here we use the number of requested blocks, however arenaFree will subtract the actual number of blocks we got from arena_blocks (possibly) resulting in a negative value and triggering ASSERT(arena_blocks >= 0).
* Fix hs_try_putmvar losing track of running capDylan Yudaken2020-02-081-0/+3
| | | | If hs_try_putmvar was called through an unsafe import, it would lose track of the running cap causing a deadlock
* fixup! fixup! Do not build StgCRunAsm.S for unregisterised buildsStefan Schulze Frielinghaus2020-02-041-0/+2
|
* Do not build StgCRunAsm.S for unregisterised buildsStefan Schulze Frielinghaus2020-02-041-0/+4
| | | | | | For unregisterised builds StgRun/StgReturn are implemented via a mini interpreter in StgCRun.c and therefore would collide with the implementations in StgCRunAsm.S.
* rts/M32Alloc: Don't attempt to unmap non-existent pagesBen Gamari2020-02-011-1/+3
| | | | | | The m32 allocator's `pages` list may contain NULLs in the case that the page was flushed. Some `munmap` implementations (e.g. FreeBSD's) don't like it if we pass them NULL. Don't do that.
* Fix rts allocateExec() on NetBSDPHO2020-01-251-2/+3
| | | | | | Similar to SELinux, NetBSD "PaX mprotect" prohibits marking a page mapping both writable and executable at the same time. Use libffi which knows how to work around it.
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-258-8/+8
|
* Fix chaining tagged and untagged ptrs in compacting GCÖmer Sinan Ağacan2020-01-251-67/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently compacting GC has the invariant that in a chain all fields are tagged the same. However this does not really hold: root pointers are not tagged, so when we thread a root we initialize a chain without a tag. When the pointed objects is evaluated and we have more pointers to it from the heap, we then add *tagged* fields to the chain (because pointers to it from the heap are tagged), ending up chaining fields with different tags (pointers from roots are NOT tagged, pointers from heap are). This breaks the invariant and as a result compacting GC turns tagged pointers into non-tagged. This later causes problem in the generated code where we do reads assuming that the pointer is aligned, e.g. 0x7(%rax) -- assumes that pointer is tagged 1 which causes misaligned reads. This caused #17088. We fix this using the "pointer tagging for large families" patch (#14373, !1742): - With the pointer tagging patch the GC can know what the tagged pointer to a CONSTR should be (previously we'd need to know the family size -- large families are always tagged 1, small families are tagged depending on the constructor). - Since we now know what the tags should be we no longer need to store the pointer tag in the info table pointers when forming chains in the compacting GC. As a result we no longer need to tag pointers in chains with 1/2 depending on whether the field points to an info table pointer, or to another field: an info table pointer is always tagged 0, everything else in the chain is tagged 1. The lost tags in pointers can be retrieved by looking at the info table. Finally, instead of using tag 1 for fields and tag 0 for info table pointers, we use two different tags for fields: - 1 for fields that have untagged pointers - 2 for fields that have tagged pointers When unchaining we then look at the pointer to a field, and depending on its tag we either leave a tagged pointer or an untagged pointer in the field. This allows chaining untagged and tagged fields together in compacting GC. Fixes #17088 Nofib results ------------- Binaries are smaller because of smaller `Compact.c` code. make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" EXTRA_HC_OPTS="-with-rtsopts=-c" NoFibRuns=1 -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.3% 0.0% +0.0% +0.0% +0.0% CSD -0.3% 0.0% +0.0% +0.0% +0.0% FS -0.3% 0.0% +0.0% -0.0% -0.0% S -0.3% 0.0% +5.4% +0.8% +3.9% VS -0.3% 0.0% +0.0% -0.0% -0.0% VSD -0.3% 0.0% -0.0% -0.0% -0.2% VSM -0.3% 0.0% +0.0% +0.0% +0.0% anna -0.1% 0.0% +0.0% +0.0% +0.0% ansi -0.3% 0.0% +0.1% +0.0% +0.0% atom -0.2% 0.0% +0.0% +0.0% +0.0% awards -0.2% 0.0% +0.0% 0.0% -0.0% banner -0.3% 0.0% +0.0% +0.0% +0.0% bernouilli -0.3% 0.0% +0.1% +0.0% +0.0% binary-trees -0.2% 0.0% +0.0% 0.0% +0.0% boyer -0.3% 0.0% +0.2% +0.0% +0.0% boyer2 -0.2% 0.0% +0.2% +0.1% +0.0% bspt -0.2% 0.0% +0.0% +0.0% +0.0% cacheprof -0.2% 0.0% +0.0% +0.0% +0.0% calendar -0.3% 0.0% +0.0% +0.0% +0.0% cichelli -0.3% 0.0% +1.1% +0.2% +0.5% circsim -0.2% 0.0% +0.0% -0.0% -0.0% clausify -0.3% 0.0% +0.0% -0.0% -0.0% comp_lab_zift -0.2% 0.0% +0.0% +0.0% +0.0% compress -0.3% 0.0% +0.0% +0.0% +0.0% compress2 -0.3% 0.0% +0.0% -0.0% -0.0% constraints -0.3% 0.0% +0.2% +0.1% +0.1% cryptarithm1 -0.3% 0.0% +0.0% -0.0% 0.0% cryptarithm2 -0.3% 0.0% +0.0% +0.0% +0.0% cse -0.3% 0.0% +0.0% +0.0% +0.0% digits-of-e1 -0.3% 0.0% +0.0% +0.0% +0.0% digits-of-e2 -0.3% 0.0% +0.0% +0.0% -0.0% dom-lt -0.2% 0.0% +0.0% +0.0% +0.0% eliza -0.2% 0.0% +0.0% +0.0% +0.0% event -0.3% 0.0% +0.1% +0.0% -0.0% exact-reals -0.2% 0.0% +0.0% +0.0% +0.0% exp3_8 -0.3% 0.0% +0.0% +0.0% +0.0% expert -0.2% 0.0% +0.0% +0.0% +0.0% fannkuch-redux -0.3% 0.0% -0.0% -0.0% -0.0% fasta -0.3% 0.0% +0.0% +0.0% +0.0% fem -0.2% 0.0% +0.1% +0.0% +0.0% fft -0.2% 0.0% +0.0% -0.0% -0.0% fft2 -0.2% 0.0% +0.0% -0.0% +0.0% fibheaps -0.3% 0.0% +0.0% -0.0% -0.0% fish -0.3% 0.0% +0.0% +0.0% +0.0% fluid -0.2% 0.0% +0.4% +0.1% +0.1% fulsom -0.2% 0.0% +0.0% +0.0% +0.0% gamteb -0.2% 0.0% +0.1% +0.0% +0.0% gcd -0.3% 0.0% +0.0% +0.0% +0.0% gen_regexps -0.3% 0.0% +0.0% -0.0% -0.0% genfft -0.3% 0.0% +0.0% +0.0% +0.0% gg -0.2% 0.0% +0.7% +0.3% +0.2% grep -0.2% 0.0% +0.0% +0.0% +0.0% hidden -0.2% 0.0% +0.0% +0.0% +0.0% hpg -0.2% 0.0% +0.1% +0.0% +0.0% ida -0.3% 0.0% +0.0% +0.0% +0.0% infer -0.2% 0.0% +0.0% -0.0% -0.0% integer -0.3% 0.0% +0.0% +0.0% +0.0% integrate -0.2% 0.0% +0.0% +0.0% +0.0% k-nucleotide -0.2% 0.0% +0.0% +0.0% -0.0% kahan -0.3% 0.0% -0.0% -0.0% -0.0% knights -0.3% 0.0% +0.0% -0.0% -0.0% lambda -0.3% 0.0% +0.0% -0.0% -0.0% last-piece -0.3% 0.0% +0.0% +0.0% +0.0% lcss -0.3% 0.0% +0.0% +0.0% 0.0% life -0.3% 0.0% +0.0% -0.0% -0.0% lift -0.2% 0.0% +0.0% +0.0% +0.0% linear -0.2% 0.0% +0.0% +0.0% +0.0% listcompr -0.3% 0.0% +0.0% +0.0% +0.0% listcopy -0.3% 0.0% +0.0% +0.0% +0.0% maillist -0.3% 0.0% +0.0% -0.0% -0.0% mandel -0.2% 0.0% +0.0% +0.0% +0.0% mandel2 -0.3% 0.0% +0.0% +0.0% +0.0% mate -0.2% 0.0% +0.0% +0.0% +0.0% minimax -0.3% 0.0% +0.0% +0.0% +0.0% mkhprog -0.2% 0.0% +0.0% +0.0% +0.0% multiplier -0.3% 0.0% +0.0% -0.0% -0.0% n-body -0.2% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.2% 0.0% +0.0% +0.0% +0.0% para -0.2% 0.0% +0.0% -0.0% -0.0% paraffins -0.3% 0.0% +0.0% -0.0% -0.0% parser -0.2% 0.0% +0.0% +0.0% +0.0% parstof -0.2% 0.0% +0.8% +0.2% +0.2% pic -0.2% 0.0% +0.1% -0.1% -0.1% pidigits -0.3% 0.0% +0.0% +0.0% +0.0% power -0.2% 0.0% +0.0% -0.0% -0.0% pretty -0.3% 0.0% -0.0% -0.0% -0.1% primes -0.3% 0.0% +0.0% +0.0% -0.0% primetest -0.2% 0.0% +0.0% -0.0% -0.0% prolog -0.3% 0.0% +0.0% -0.0% -0.0% puzzle -0.3% 0.0% +0.0% +0.0% +0.0% queens -0.3% 0.0% +0.0% +0.0% +0.0% reptile -0.2% 0.0% +0.2% +0.1% +0.0% reverse-complem -0.3% 0.0% +0.0% +0.0% +0.0% rewrite -0.3% 0.0% +0.0% -0.0% -0.0% rfib -0.2% 0.0% +0.0% +0.0% -0.0% rsa -0.2% 0.0% +0.0% +0.0% +0.0% scc -0.3% 0.0% -0.0% -0.0% -0.1% sched -0.3% 0.0% +0.0% +0.0% +0.0% scs -0.2% 0.0% +0.1% +0.0% +0.0% simple -0.2% 0.0% +3.4% +1.0% +1.8% solid -0.2% 0.0% +0.0% +0.0% +0.0% sorting -0.3% 0.0% +0.0% +0.0% +0.0% spectral-norm -0.2% 0.0% -0.0% -0.0% -0.0% sphere -0.2% 0.0% +0.0% +0.0% +0.0% symalg -0.2% 0.0% +0.0% +0.0% +0.0% tak -0.3% 0.0% +0.0% +0.0% -0.0% transform -0.2% 0.0% +0.2% +0.1% +0.1% treejoin -0.3% 0.0% +0.2% -0.0% -0.1% typecheck -0.3% 0.0% +0.0% +0.0% +0.0% veritas -0.1% 0.0% +0.0% +0.0% +0.0% wang -0.2% 0.0% +0.0% -0.0% -0.0% wave4main -0.3% 0.0% +0.0% -0.0% -0.0% wheel-sieve1 -0.3% 0.0% +0.0% -0.0% -0.0% wheel-sieve2 -0.3% 0.0% +0.0% -0.0% -0.0% x2n1 -0.3% 0.0% +0.0% +0.0% +0.0% -------------------------------------------------------------------------------- Min -0.3% 0.0% -0.0% -0.1% -0.2% Max -0.1% 0.0% +5.4% +1.0% +3.9% Geometric Mean -0.3% -0.0% +0.1% +0.0% +0.1% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.2% 0.0% +1.6% +0.4% +0.7% constraints -0.3% 0.0% +4.3% +1.5% +2.3% fibheaps -0.3% 0.0% +3.5% +1.2% +1.3% fulsom -0.2% 0.0% +3.6% +1.2% +1.8% gc_bench -0.3% 0.0% +4.1% +1.3% +2.3% hash -0.3% 0.0% +6.6% +2.2% +3.6% lcss -0.3% 0.0% +0.7% +0.2% +0.7% mutstore1 -0.3% 0.0% +4.8% +1.4% +2.8% mutstore2 -0.3% 0.0% +3.4% +1.0% +1.7% power -0.2% 0.0% +2.7% +0.6% +1.9% spellcheck -0.3% 0.0% +1.1% +0.4% +0.4% -------------------------------------------------------------------------------- Min -0.3% 0.0% +0.7% +0.2% +0.4% Max -0.2% 0.0% +6.6% +2.2% +3.6% Geometric Mean -0.3% +0.0% +3.3% +1.0% +1.8% Metric changes -------------- While it sounds ridiculous, this change causes increased allocations in the following tests. We concluded that this change can't cause a difference in allocations and decided to land this patch. Fluctuations in "bytes allocated" metric is tracked in #17686. Metric Increase: Naperian T10547 T12150 T12234 T12425 T13035 T5837 T6048
* Fix +RTS -Z flag documentationÖmer Sinan Ağacan2020-01-201-1/+1
| | | | | | | | | Stack squeezing is done on context switch, not on GC or stack overflow. Fix the documentation. Fixes #17685 [ci skip]
* Fix more typos, via an improved Levenshtein-style correctorBrian Wignall2020-01-129-15/+15
|
* Fix overflow.Tamar Christina2020-01-061-1/+1
|
* Fix typos, via a Levenshtein-style correctorBrian Wignall2020-01-0415-15/+15
|
* Fix some sloppy indentationKevin Buhr2019-12-311-3/+3
|
* Add additional Note explaining the -Iw flagKevin Buhr2019-12-311-2/+49
|
* Add "-Iw" RTS flag for minimum wait between idle GCs (#11134)Kevin Buhr2019-12-312-20/+42
|
* rts: Fix --debug-numa mode under DockerBen Gamari2019-12-302-0/+3
| | | | | | | | | As noted in #17606, Docker disallows the get_mempolicy syscall by default. This caused numerous tests to fail under CI in the `debug_numa` way. Avoid this by disabling the NUMA probing logic when --debug-numa is in use, instead setting n_numa_nodes in RtsFlags.c. Fixes #17606.
* rts: Error on invalid --numa flagsBen Gamari2019-12-301-1/+6
| | | | | Previously things like `+RTS --numa-debug` would enable NUMA support, despite being an invalid flag.
* rts: Ensure that nonmoving gc isn't used with profilingBen Gamari2019-12-301-0/+5
|
* Remove outdated commentSylvain Henry2019-12-241-4/+2
|
* Handle large ARR_WORDS in heap census (fix #17572)Sylvain Henry2019-12-191-0/+16
| | | | | | | | We can do a heap census with a non-profiling RTS. With a non-profiling RTS we don't zero superfluous bytes of shrunk arrays hence a need to handle the case specifically to avoid a crash. Revert part of a586b33f8e8ad60b5c5ef3501c89e9b71794bbed
* Revert "rts: Drop redundant flags for libffi"Ben Gamari2019-12-121-3/+8
| | | | | | | This seems to have regressed builds using `--with-system-libffi` (#17520). This reverts commit 3ce18700f80a12c48a029b49c6201ad2410071bb.