summaryrefslogtreecommitdiff
path: root/rts
Commit message (Collapse)AuthorAgeFilesLines
* Load .lo as well.Moritz Angermann2020-06-151-0/+1
| | | | | | | | Some archives contain so called linker objects, with the affectionate .lo suffic. For example the musl libc.a will come in that form. We still want to load those objects, hence we should not discard them and look for .lo as well. Ultimately we might want to fix this proerly by looking at the file magic.
* winio: Add Atomic Exchange PrimOp and implement Atomic Ptr exchanges.Tamar Christina2020-06-142-0/+16
| | | | | | | The initial version was rewritten by Tamar Christina. It was rewritten in large parts by Andreas Klebinger. Co-authored-by: Andreas Klebinger <klebinger.andreas@gmx.at>
* Range is actually +/-2^32, not +/-2^31Moritz Angermann2020-06-071-5/+7
| | | | See also: https://static.docs.arm.com/ihi0056/g/aaelf64.pdf
* [linker] Adds void printLoadedObjects(void);wip/angerman/print-loaded-objectsMoritz Angermann2020-06-042-2/+30
| | | | | | This allows us to dump in-memory object code locations for debugging. Fixup printLoadedObjects prototype
* rts: Add Windows-specific implementation of rtsSleepBen Gamari2020-06-041-2/+9
| | | | | | | | | Previously we would use the POSIX path, which uses `nanosleep`. However, it turns out that `nanosleep` is provided by `libpthread` on Windows. In general we don't want to incur such a dependency. Avoid this by simply using `Sleep` on Windows. Fixes #18272.
* Clean up file paths for new module hierarchyTakenobu Tani2020-06-015-6/+6
| | | | | | | | | This updates comments only. This patch replaces file references according to new module hierarchy. See also: * https://gitlab.haskell.org/ghc/ghc/-/wikis/Make-GHC-codebase-more-modular * https://gitlab.haskell.org/ghc/ghc/issues/13009
* nonmoving: Optimise log2_ceilBen Gamari2020-06-011-7/+1
|
* Cleanup OVERWRITING_CLOSURE logicDaniel Gröber2020-06-012-7/+7
| | | | | | | | | | | The code is just more confusing than it needs to be. We don't need to mix the threaded check with the ldv profiling check since ldv's init already checks for this. Hence they can be two separate checks. Taking the sanity checking into account is also cleaner via DebugFlags.sanity. No need for checking the DEBUG define. The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the old code had also make things a lot more opaque IMO so I removed those.
* rts: Drop compatibility shims for Windows VistaBen Gamari2020-05-302-52/+6
| | | | | We can now assume that the thread and processor group interfaces are available.
* rts: Teach getNumProcessors to return available processorsBen Gamari2020-05-301-10/+34
| | | | | | | | | | | | Previously we would report the number of physical processors, which can be quite wrong in a containerized setting. Now we rather return how many processors are in our affinity mask when possible. I also refactored the code to prefer platform-specific since this will report logical CPUs instead of physical (using `machdep.cpu.thread_count` on Darwin and `cpuset_getaffinity` on FreeBSD). Fixes #14781.
* Always define USE_PTHREAD_FOR_ITIMER for FreeBSD.Gleb Popov2020-05-291-1/+1
|
* eventlog: Fix racy flushingBen Gamari2020-05-271-1/+20
| | | | | | | | | | | Previously no attempt was made to avoid multiple threads writing their capability-local eventlog buffers to the eventlog writer simultaneously. This could result in multiple eventlog streams being interleaved. Fix this by documenting that the EventLogWriter's write() and flush() functions may be called reentrantly and fix the default writer to protect its FILE* by a mutex. Fixes #18210.
* nonmoving: Optimise the write barrierBen Gamari2020-05-211-21/+18
|
* rts/CNF: Fix fixup comparison functionBen Gamari2020-05-131-2/+3
| | | | | | | Previously we would implicitly convert the difference between two words to an int, resulting in an integer overflow on 64-bit machines. Fixes #16992
* rts: Make non-existent linker search path merely a warningBen Gamari2020-05-091-2/+2
| | | | | | | | As noted in #18105, previously this resulted in a rather intrusive error message. This is in contrast to the general expectation that search paths are merely places to look, not places that must exist. Fixes #18105.
* Fix unboxed-sums GC ptr-slot rubbish value (#17791)Sylvain Henry2020-05-097-11/+10
| | | | | | | This patch allows boot libraries to use unboxed sums without implicitly depending on `base` package because of `absentSumFieldError`. See updated Note [aBSENT_SUM_FIELD_ERROR_ID] in GHC.Core.Make
* Linker.c: initialize n_symbols of ObjectCode with other fieldsÖmer Sinan Ağacan2020-05-081-0/+1
|
* Remove unused SEGMENT_PROT_RWXGreg Steuck2020-05-081-2/+0
| | | | | It's been unused for a year and is problematic on any OS which requires W^X for security.
* ELF linker: increment curSymbol after filling in fields of current entryÖmer Sinan Ağacan2020-05-061-1/+2
| | | | | | | | | The bug was introduced in a8b7cef4d45 which added a field to the `symbols` array elements and then updated this code incorrectly: - oc->symbols[curSymbol++] = nm; + oc->symbols[curSymbol++].name = nm; + oc->symbols[curSymbol].addr = symbol->addr;
* nonmoving: Fix handling of dirty objectsBen Gamari2020-05-064-3/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we (incorrectly) relied on failed_to_evac to be "precise". That is, we expected it to only be true if *all* of an object's fields lived outside of the non-moving heap. However, does not match the behavior of failed_to_evac, which is true if *any* of the object's fields weren't promoted (meaning that some others *may* live in the non-moving heap). This is problematic as we skip the non-moving write barrier for dirty objects (which we can only safely do if *all* fields point outside of the non-moving heap). Clearly this arises due to a fundamental difference in the behavior expected of failed_to_evac in the moving and non-moving collector. e.g., in the moving collector it is always safe to conservatively say failed_to_evac=true whereas in the non-moving collector the safe value is false. This issue went unnoticed as I never wrote down the dirtiness invariant enforced by the non-moving collector. We now define this invariant as An object being marked as dirty implies that all of its fields are on the mark queue (or, equivalently, update remembered set). To maintain this invariant we teach nonmovingScavengeOne to push the fields of objects which we fail to evacuate to the update remembered set. This is a simple and reasonably cheap solution and avoids the complexity and fragility that other, more strict alternative invariants would require. All of this is described in a new Note, Note [Dirty flags in the non-moving collector] in NonMoving.c.
* nonmoving: Fix incorrect failed_to_evac value during deadlock gcBen Gamari2020-05-061-4/+3
| | | | | | | Previously we would incorrectly set the failed_to_evac flag if we evacuated a value due to a deadlock GC. This would cause us to mark more things as dirty than strictly necessary. It also turned up a nasty but which I will fix next.
* rts: Zero block flags with -DZBen Gamari2020-05-061-0/+12
| | | | | | | Block flags are very useful for determining the state of a block. However, some block allocator users don't touch them, leading to misleading values. Ensure that we zero then when zero-on-gc is set. This is safe and makes the flags more useful during debugging.
* rts: Enable tracing of nonmoving heap census with -lnBen Gamari2020-05-032-0/+5
| | | | | Previously this was not easily available to the user. Fix this. Non-moving collection lifecycle events are now reported with -lg.
* nonmoving: Eagerly flush all capabilities' update remembered setsBen Gamari2020-04-301-2/+4
| | | | (cherry picked from commit 2fa79119570b358a4db61446396889b8260d7957)
* nonmoving: Explicitly memoize block countBen Gamari2020-04-301-3/+2
| | | | | | A profile cast doubt on whether the compiler hoisted the bound out the loop as I would have expected here. It turns out it did but nevertheless it seems clearer to just do this manually.
* nonmoving: Clear bitmap after initializing block sizeBen Gamari2020-04-301-1/+1
| | | | | | | Previously nonmovingInitSegment would clear the bitmap before initializing the segment's block size. This is broken since nonmovingClearBitmap looks at the segment's block size to determine how much bitmap to clear.
* Tweak includes in non-moving GC headersÖmer Sinan Ağacan2020-04-263-4/+1
| | | | | | | We don't use hash tables in non-moving GC so remove the includes. This breaks Compact.c as existing includes no longer include Hash.h, so include Hash.h explicitly in Compact.c.
* RTS: workaround a Linux kernel bug in timerfdSylvain Henry2020-04-221-4/+12
| | | | | | | | | | | | | | Reading a timerfd may return 0: https://lkml.org/lkml/2019/8/16/335. This is currently undocumented behavior and documentation "won't happen anytime soon" (https://lkml.org/lkml/2020/2/13/295). With this patch, we just ignore the result instead of crashing. It may fix #18033 but we can't be sure because we don't have enough information. See also this discussion about the kernel bug: https://github.com/Azure/sonic-swss-common/pull/302/files/1f070e7920c2e5d63316c0105bf4481e73d72dc9
* Fix build warning; add more informative information to the linker; fix ↵Moritz Angermann2020-04-222-5/+15
| | | | linker for empty sections
* rts: Ensure that sigaction structs are initializedBen Gamari2020-04-221-2/+3
| | | | | | I noticed these may have uninitialized fields when looking into #18037. The reporter says that zeroing them doesn't fix the MSAN failures they observe but zeroing them is the right thing to do regardless.
* Modules (#13009)Sylvain Henry2020-04-182-2/+2
| | | | | | | | | | | | | | * SysTools * Parser * GHC.Builtin * GHC.Iface.Recomp * Settings Update Haddock submodule Metric Decrease: Naperian parsing001
* rts: ProfHeap: Fix wrong time in last heap profile sampleDaniel Gröber2020-04-151-3/+4
| | | | | | | | | | | | | | | | | | We've had this longstanding issue in the heap profiler, where the time of the last sample in the profile is sometimes way off causing the rendered graph to be quite useless for long runs. It seems to me the problem is that we use mut_user_time() for the last sample as opposed to getRTSStats(), which we use when calling heapProfile() in GC.c. The former is equivalent to getProcessCPUTime() but the latter does some additional stuff: getProcessCPUTime() - end_init_cpu - stats.gc_cpu_ns - stats.nonmoving_gc_cpu_ns So to fix this just use getRTSStats() in both places.
* rts: Assert LDV_recordDead is not called for inherently used closuresDaniel Gröber2020-04-142-24/+37
| | | | | The comments make it clear LDV_recordDead should not be called for inhererently used closures, so add an assertion to codify this fact.
* Remove call to LDV_RECORD_CREATE for array resizingDaniel Gröber2020-04-141-15/+10
|
* rts: Fix nomenclature in OVERWRITING_CLOSURE macrosDaniel Gröber2020-04-141-4/+16
| | | | | | | | | | The additional commentary introduced by commit 8916e64e5437 ("Implement shrinkSmallMutableArray# and resizeSmallMutableArray#.") unfortunately got this wrong. We set 'prim' to true in overwritingClosureOfs because we _don't_ want to call LDV_recordDead(). The reason is because of this "inherently used" distinction made in the LDV profiler so I rename the variable to be more appropriate.
* rts: Underline some Notes as is conventionalDaniel Gröber2020-04-143-0/+3
|
* rts: allocatePinned: Fix confusion about word/byte unitsDaniel Gröber2020-04-141-19/+22
|
* rts: Expand and add more notes regarding slopDaniel Gröber2020-04-144-8/+74
|
* Zero out pinned block alignment slop when profilingDaniel Gröber2020-04-142-23/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The heap profiler currently cannot traverse pinned blocks because of alignment slop. This used to just be a minor annoyance as the whole block is accounted into a special cost center rather than the respective object's CCS, cf. #7275. However for the new root profiler we would like to be able to visit _every_ closure on the heap. We need to do this so we can get rid of the current 'flip' bit hack in the heap traversal code. Since info pointers are always non-zero we can in principle skip all the slop in the profiler if we can rely on it being zeroed. This assumption caused problems in the past though, commit a586b33f8e ("rts: Correct handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use the same trick for BF_LARGE objects but neglected to take into account that shrink*Array# functions don't ensure that slop is zeroed when not compiling with profiling. Later, commit 0c114c6599 ("Handle large ARR_WORDS in heap census (fix as we will only be assuming slop is zeroed when profiling is on. This commit also reduces the ammount of slop we introduce in the first place by calculating the needed alignment before doing the allocation for small objects where we know the next available address. For large objects we don't know how much alignment we'll have to do yet since those details are hidden behind the allocateMightFail function so there we continue to allocate the maximum additional words we'll need to do the alignment. So we don't have to duplicate all this logic in the cmm code we pull it into the RTS allocatePinned function instead. Metric Decrease: T7257 haddock.Cabal haddock.base
* rts: Don't mark evacuate_large as inlineBen Gamari2020-04-141-1/+1
| | | | | | This function has two callsites and is quite large. GCC consequently decides not to inline and warns instead. Given the situation, I can't blame it. Let's just remove the inline specifier.
* StgCRun: Enable unwinding only on LinuxBen Gamari2020-04-141-4/+11
| | | | | | It's broken on macOS due and SmartOS due to assembler differences (#15207) so let's be conservative in enabling it. Also, refactor things to make the intent clearer.
* Small change to the windows ticker.Andreas Klebinger2020-04-141-1/+7
| | | | | We already have a function to go from time to ms so use it. Also expand on the state of timer resolution.
* Rts: show errno on failure (#18033)Sylvain Henry2020-04-091-5/+5
|
* Fix CNF handling in compacting GCÖmer Sinan Ağacan2020-04-095-10/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #17937 Previously compacting GC simply ignored CNFs. This is mostly fine as most (see "What about small compacts?" below) CNF objects don't have outgoing pointers, and are "large" (allocated in large blocks) and large objects are not moved or compacted. However if we do GC *during* sharing-preserving compaction then the CNF will have a hash table mapping objects that have been moved to the CNF to their location in the CNF, to be able to preserve sharing. This case is handled in the copying collector, in `scavenge_compact`, where we evacuate hash table entries and then rehash the table. Compacting GC ignored this case. We now visit CNFs in all generations when threading pointers to the compacted heap and thread hash table keys. A visited CNF is added to the list `nfdata_chain`. After compaction is done, we re-visit the CNFs in that list and rehash the tables. The overhead is minimal: the list is static in `Compact.c`, and link field is added to `StgCompactNFData` closure. Programs that don't use CNFs should not be affected. To test this CNF tests are now also run in a new way 'compacting_gc', which just passes `-c` to the RTS, enabling compacting GC for the oldest generation. Before this patch the result would be: Unexpected failures: compact_gc.run compact_gc [bad exit code (139)] (compacting_gc) compact_huge_array.run compact_huge_array [bad exit code (1)] (compacting_gc) With this patch all tests pass. I can also pass `-c -DS` without any failures. What about small compacts? Small CNFs are still not handled by the compacting GC. However so far I'm unable to write a test that triggers a runtime panic ("update_fwd: unknown/strange object") by allocating a small CNF in a compated heap. It's possible that I'm missing something and it's not possible to have a small CNF. NoFib Results: -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS +0.1% 0.0% 0.0% +0.0% +0.0% CSD +0.1% 0.0% 0.0% 0.0% 0.0% FS +0.1% 0.0% 0.0% 0.0% 0.0% S +0.1% 0.0% 0.0% 0.0% 0.0% VS +0.1% 0.0% 0.0% 0.0% 0.0% VSD +0.1% 0.0% +0.0% +0.0% -0.0% VSM +0.1% 0.0% +0.0% -0.0% 0.0% anna +0.0% 0.0% -0.0% -0.0% -0.0% ansi +0.1% 0.0% +0.0% +0.0% +0.0% atom +0.1% 0.0% +0.0% +0.0% +0.0% awards +0.1% 0.0% +0.0% +0.0% +0.0% banner +0.1% 0.0% +0.0% +0.0% +0.0% bernouilli +0.1% 0.0% 0.0% -0.0% +0.0% binary-trees +0.1% 0.0% -0.0% -0.0% 0.0% boyer +0.1% 0.0% +0.0% +0.0% +0.0% boyer2 +0.1% 0.0% +0.0% +0.0% +0.0% bspt +0.1% 0.0% -0.0% -0.0% -0.0% cacheprof +0.1% 0.0% -0.0% -0.0% -0.0% calendar +0.1% 0.0% +0.0% +0.0% +0.0% cichelli +0.1% 0.0% +0.0% +0.0% +0.0% circsim +0.1% 0.0% +0.0% +0.0% +0.0% clausify +0.1% 0.0% -0.0% +0.0% +0.0% comp_lab_zift +0.1% 0.0% +0.0% +0.0% +0.0% compress +0.1% 0.0% +0.0% +0.0% 0.0% compress2 +0.1% 0.0% -0.0% 0.0% 0.0% constraints +0.1% 0.0% +0.0% +0.0% +0.0% cryptarithm1 +0.1% 0.0% +0.0% +0.0% +0.0% cryptarithm2 +0.1% 0.0% +0.0% +0.0% +0.0% cse +0.1% 0.0% +0.0% +0.0% +0.0% digits-of-e1 +0.1% 0.0% +0.0% -0.0% -0.0% digits-of-e2 +0.1% 0.0% -0.0% -0.0% -0.0% dom-lt +0.1% 0.0% +0.0% +0.0% +0.0% eliza +0.1% 0.0% +0.0% +0.0% +0.0% event +0.1% 0.0% +0.0% +0.0% +0.0% exact-reals +0.1% 0.0% +0.0% +0.0% +0.0% exp3_8 +0.1% 0.0% +0.0% -0.0% 0.0% expert +0.1% 0.0% +0.0% +0.0% +0.0% fannkuch-redux +0.1% 0.0% -0.0% 0.0% 0.0% fasta +0.1% 0.0% -0.0% +0.0% +0.0% fem +0.1% 0.0% -0.0% +0.0% 0.0% fft +0.1% 0.0% -0.0% +0.0% +0.0% fft2 +0.1% 0.0% +0.0% +0.0% +0.0% fibheaps +0.1% 0.0% +0.0% +0.0% +0.0% fish +0.1% 0.0% +0.0% +0.0% +0.0% fluid +0.0% 0.0% +0.0% +0.0% +0.0% fulsom +0.1% 0.0% -0.0% +0.0% 0.0% gamteb +0.1% 0.0% +0.0% +0.0% 0.0% gcd +0.1% 0.0% +0.0% +0.0% +0.0% gen_regexps +0.1% 0.0% -0.0% +0.0% 0.0% genfft +0.1% 0.0% +0.0% +0.0% +0.0% gg +0.1% 0.0% 0.0% +0.0% +0.0% grep +0.1% 0.0% -0.0% +0.0% +0.0% hidden +0.1% 0.0% +0.0% -0.0% 0.0% hpg +0.1% 0.0% -0.0% -0.0% -0.0% ida +0.1% 0.0% +0.0% +0.0% +0.0% infer +0.1% 0.0% +0.0% 0.0% -0.0% integer +0.1% 0.0% +0.0% +0.0% +0.0% integrate +0.1% 0.0% -0.0% -0.0% -0.0% k-nucleotide +0.1% 0.0% +0.0% +0.0% 0.0% kahan +0.1% 0.0% +0.0% +0.0% +0.0% knights +0.1% 0.0% -0.0% -0.0% -0.0% lambda +0.1% 0.0% +0.0% +0.0% -0.0% last-piece +0.1% 0.0% +0.0% 0.0% 0.0% lcss +0.1% 0.0% +0.0% +0.0% 0.0% life +0.1% 0.0% -0.0% +0.0% +0.0% lift +0.1% 0.0% +0.0% +0.0% +0.0% linear +0.1% 0.0% -0.0% +0.0% 0.0% listcompr +0.1% 0.0% +0.0% +0.0% +0.0% listcopy +0.1% 0.0% +0.0% +0.0% +0.0% maillist +0.1% 0.0% +0.0% -0.0% -0.0% mandel +0.1% 0.0% +0.0% +0.0% 0.0% mandel2 +0.1% 0.0% +0.0% +0.0% +0.0% mate +0.1% 0.0% +0.0% 0.0% +0.0% minimax +0.1% 0.0% -0.0% 0.0% -0.0% mkhprog +0.1% 0.0% +0.0% +0.0% +0.0% multiplier +0.1% 0.0% +0.0% 0.0% 0.0% n-body +0.1% 0.0% +0.0% +0.0% +0.0% nucleic2 +0.1% 0.0% +0.0% +0.0% +0.0% para +0.1% 0.0% 0.0% +0.0% +0.0% paraffins +0.1% 0.0% +0.0% -0.0% 0.0% parser +0.1% 0.0% -0.0% -0.0% -0.0% parstof +0.1% 0.0% +0.0% +0.0% +0.0% pic +0.1% 0.0% -0.0% -0.0% 0.0% pidigits +0.1% 0.0% +0.0% -0.0% -0.0% power +0.1% 0.0% +0.0% +0.0% +0.0% pretty +0.1% 0.0% -0.0% -0.0% -0.1% primes +0.1% 0.0% -0.0% -0.0% -0.0% primetest +0.1% 0.0% -0.0% -0.0% -0.0% prolog +0.1% 0.0% -0.0% -0.0% -0.0% puzzle +0.1% 0.0% -0.0% -0.0% -0.0% queens +0.1% 0.0% +0.0% +0.0% +0.0% reptile +0.1% 0.0% -0.0% -0.0% +0.0% reverse-complem +0.1% 0.0% +0.0% 0.0% -0.0% rewrite +0.1% 0.0% -0.0% -0.0% -0.0% rfib +0.1% 0.0% +0.0% +0.0% +0.0% rsa +0.1% 0.0% -0.0% +0.0% -0.0% scc +0.1% 0.0% -0.0% -0.0% -0.1% sched +0.1% 0.0% +0.0% +0.0% +0.0% scs +0.1% 0.0% +0.0% +0.0% +0.0% simple +0.1% 0.0% -0.0% -0.0% -0.0% solid +0.1% 0.0% +0.0% +0.0% +0.0% sorting +0.1% 0.0% -0.0% -0.0% -0.0% spectral-norm +0.1% 0.0% +0.0% +0.0% +0.0% sphere +0.1% 0.0% -0.0% -0.0% -0.0% symalg +0.1% 0.0% -0.0% -0.0% -0.0% tak +0.1% 0.0% +0.0% +0.0% +0.0% transform +0.1% 0.0% +0.0% +0.0% +0.0% treejoin +0.1% 0.0% +0.0% -0.0% -0.0% typecheck +0.1% 0.0% +0.0% +0.0% +0.0% veritas +0.0% 0.0% +0.0% +0.0% +0.0% wang +0.1% 0.0% 0.0% +0.0% +0.0% wave4main +0.1% 0.0% +0.0% +0.0% +0.0% wheel-sieve1 +0.1% 0.0% +0.0% +0.0% +0.0% wheel-sieve2 +0.1% 0.0% +0.0% +0.0% +0.0% x2n1 +0.1% 0.0% +0.0% +0.0% +0.0% -------------------------------------------------------------------------------- Min +0.0% 0.0% -0.0% -0.0% -0.1% Max +0.1% 0.0% +0.0% +0.0% +0.0% Geometric Mean +0.1% -0.0% -0.0% -0.0% -0.0% Bumping numbers of nonsensical perf tests: Metric Increase: T12150 T12234 T12425 T13035 T5837 T6048 It's simply not possible for this patch to increase allocations, and I've wasted enough time on these test in the past (see #17686). I think these tests should not be perf tests, but for now I'll bump the numbers.
* rts: ProfHeap: Fix memory leak when not compiled with profilingDaniel Gröber2020-04-071-1/+1
| | | | | | | | | | | | If we're doing heap profiling on an unprofiled executable we keep allocating new space in initEra via nextEra on each profiler run but we don't have a corresponding freeEra call. We do free the last era in endHeapProfiling but previous eras will have been overwritten by initEra and will never get free()ed. Metric Decrease: space_leak_001
* Improve and refactor StgToCmm codegen for DataCons.Andreas Klebinger2020-04-031-1/+1
| | | | | | | | | | | | | | | | We now differentiate three cases of constructor bindings: 1)Bindings which we can "replace" with a reference to an existing closure. Reference the replacement closure when accessing the binding. 2)Bindings which we can "replace" as above. But we still generate a closure which will be referenced by modules importing this binding. 3)For any other binding generate a closure. Then reference it. Before this patch 1) did only apply to local bindings and we didn't do 2) at all.
* Remove unused closure stg_IND_directÖmer Sinan Ağacan2020-04-021-9/+0
|
* Fix a pointer format string in RTSÖmer Sinan Ağacan2020-04-021-1/+1
|
* Update sanity checking for TSOs:Ömer Sinan Ağacan2020-03-161-36/+22
| | | | | | | | | | | | | | | | - Remove an invalid assumption about GC checking what_next field. The GC doesn't care about what_next at all, if a TSO is reachable then all its pointers are followed (other than global_tso, which is only followed by compacting GC). - Remove checkSTACK in checkTSO: TSO stacks will be visited in checkHeapChain, or checkLargeObjects etc. - Add an assertion in checkTSO to check that the global_link field is sane. - Did some refactor to remove forward decls in checkGlobalTSOList and added braces around single-statement if statements.
* Fix global_link of TSOs for threads reachable via dead weaksÖmer Sinan Ağacan2020-03-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #17785 Here's how the problem occurs: - In generation 0 we have a TSO that is finished (i.e. it has no more work to do or it is killed). - The TSO only becomes reachable after collectDeadWeakPtrs(). - After collectDeadWeakPtrs() we switch to WeakDone phase where we don't move TSOs to different lists anymore (like the next gen's thread list or the resurrected_threads list). - So the TSO will never be moved to a generation's thread list, but it will be promoted to generation 1. - Generation 1 collected via mark-compact, and because the TSO is reachable it is marked, and its `global_link` field, which is bogus at this point (because the TSO is not in a list), will be threaded. - Chaos ensues. In other words, when these conditions hold: - A TSO is reachable only after collectDeadWeakPtrs() - It's finished (what_next is ThreadComplete or ThreadKilled) - It's retained by mark-compact collector (moving collector doesn't evacuate the global_list field) We end up doing random mutations on the heap because the TSO's global_list field is not valid, but it still looks like a heap pointer so we thread it during compacting GC. The fix is simple: when we traverse old_threads lists to resurrect unreachable threads the threads that won't be resurrected currently stays on the old_threads lists. Those threads will never be visited again by MarkWeak so we now reset the global_list fields. This way compacting GC does not thread pointers to nowhere. Testing ------- The reproducer in #17785 is quite large and hard to build, because of the dependencies, so I'm not adding a regression test. In my testing the reproducer would take a less than 5 seconds to run, and once in every ~5 runs would fail with a segfault or an assertion error. In other cases it also fails with a test failure. Because the tests never fail with the bug fix, assuming the code is correct, this also means that this bug can sometimes lead to incorrect runtime results. After the fix I was able to run the reproducer repeatedly for about an hour, with no runtime crashes or test failures. To run the reproducer clone the git repo: $ git clone https://github.com/osa1/streamly --branch ghc-segfault Then clone primitive and atomic-primops from their git repos and point to the clones in cabal.project.local. The project should then be buildable using GHC HEAD. Run the executable `properties` with `+RTS -c -DZ`. In addition to the reproducer above I run the test suite using: $ make slowtest EXTRA_HC_OPTS="-debug -with-rtsopts=-DS \ -with-rtsopts=-c +RTS -c -RTS" SKIPWAY='nonmoving nonmoving_thr' This enables compacting GC always in both GHC when building the test programs and when running the test programs, and also enables sanity checking when running the test programs. These set of flags are not compatible for all tests so there are some failures, but I got the same set of failures with this patch compared to GHC HEAD.