summaryrefslogtreecommitdiff
path: root/rts/sm/Storage.c
Commit message (Collapse)AuthorAgeFilesLines
* rts: Drop racy assertionBen Gamari2023-01-121-0/+3
| | | | | | | 0e274c39bf836d5bb846f5fa08649c75f85326ac added an assertion in `dirty_MUT_VAR` checking that the MUT_VAR being dirtied was clean. However, this isn't necessarily the case since another thread may have raced us to dirty the object.
* Revert "rts: Drop racy assertion"Ben Gamari2023-01-121-3/+0
| | | | | | | The logic here was inverted. Reverting the commit to avoid confusion when examining the commit history. This reverts commit b3eacd64fb36724ed6c5d2d24a81211a161abef1.
* nonmoving: Refactor update remembered set initializationBen Gamari2022-12-231-1/+1
| | | | | | | This avoids a lock inversion between the storage manager mutex and the stable pointer table mutex by not dropping the SM_MUTEX in nonmovingCollect. This requires quite a bit of rejiggering but it does seem like a better strategy.
* rts: Drop racy assertionBen Gamari2022-12-181-0/+3
| | | | | | | 0e274c39bf836d5bb846f5fa08649c75f85326ac added an assertion in `dirty_MUT_VAR` checking that the MUT_VAR being dirtied was clean. However, this isn't necessarily the case since another thread may have raced us to dirty the object.
* rts: Encapsulate access to capabilities arrayBen Gamari2022-12-161-18/+18
|
* rts: Introduce getNumCapabilitiesBen Gamari2022-12-161-7/+7
| | | | | | And ensure accesses to n_capabilities are atomic (although with relaxed ordering). This is necessary as RTS API callers may concurrently call into the RTS without holding a capability.
* Improve heap memory barrier NoteBen Gamari2022-12-161-2/+2
| | | | | Also introduce MUT_FIELD marker in Closures.h to document mutable fields.
* rts: make flushExec a no-op on wasm32Cheng Shao2022-11-111-0/+1
| | | | | This patch makes flushExec a no-op on wasm32, since there's no such thing as executable memory on wasm32 in the first place.
* rts: Address failures to inlineDouglas Wilson2022-02-021-1/+2
|
* Fix a few Note inconsistenciesBen Gamari2022-02-011-7/+2
|
* Require all dirty_MUT_VAR callers to do explicit stg_MUT_VAR_CLEAN_info ↵nineonine2021-12-021-7/+9
| | | | comparison (#20088)
* Make `PosixSource.h` installed and under `rts/`John Ericson2021-08-091-3/+3
| | | | | | is used outside of the rts so we do this rather than just fish it out of the repo in ad-hoc way, in order to make packages in this repo more self-contained.
* rts: Drop allocateExec and friendsBen Gamari2021-07-271-92/+0
| | | | All uses of these now use ExecPage.
* rts: Move libffi interfaces all to AdjustorBen Gamari2021-07-271-83/+2
| | | | | | | Previously the libffi Adjustor implementation would use allocateExec to create executable mappings. However, allocateExec is also used elsewhere in GHC to allocate things other than ffi_closure, which is a use-case which libffi does not support.
* Guard Allocate Exec via LIBFFI by LIBFFIMoritz Angermann2021-06-201-1/+1
| | | | | | | | | | | | We now have two darwin flavours. AArch64-Darwin, and x86_64-darwin, the latter one which has proper custom adjustor support, the former though relies on libffi. Mixing both leads to odd crashes, as the closures might not fit the size of the libffi closures. Hence this needs to be guarded by the USE_LBFFI_FOR_ADJUSTORS guard. Original patch by Hamish Mackenzie
* Allocate Adjustors and mark them readable in two stepsMoritz Angermann2021-03-291-1/+36
| | | | | | | | | This drops allocateExec for darwin, and replaces it with a alloc, write, mark executable strategy instead. This prevents us from trying to allocate an executable range and then write to it, which X^W will prohibit on darwin. This will *only* work if we can use mmap.
* Make traceHeapEventInfo an init eventMatthew Pickering2021-03-141-6/+18
| | | | This means it will be reposted everytime the eventlog is started.
* rts: Use a separate free block list for allocatePinnedMatthew Pickering2021-03-081-13/+149
| | | | | | | | | | | The way in which allocatePinned took blocks out of the nursery was leading to horrible fragmentation in some workloads. The strategy now is that a separate free block list is reserved for each capability and blocks are taken from there. When it's empty the global SM lock is taken and a fresh block of size PINNED_EMPTY_SIZE is allocated. Fixes #19481
* rts: Add generic block traversal function, listAllBlocksMatthew Pickering2021-02-181-0/+36
| | | | | | | | | This function is exposed in the RtsAPI.h so that external users have a blessed way to traverse all the different `bdescr`s which are known by the RTS. The main motivation is to use this function in ghc-debug but avoid having to expose the internal structure of a Capability in the API.
* rts: Zero shrunk array slop in vanilla RTSBen Gamari2021-01-071-4/+9
| | | | | | But only when profiling or DEBUG are enabled. Fixes #17572.
* Storage: Unconditionally enable zeroing of alignment slopBen Gamari2021-01-071-11/+11
| | | | This is necessary since the user may enable `+RTS -hT` at any time.
* AArch64/arm64 adjustmentsMoritz Angermann2020-11-151-4/+4
| | | | | | | | This addes the necessary logic to support aarch64 on elf, as well as aarch64 on mach-o, which Apple calls arm64. We change architecture name to AArch64, which is the official arm naming scheme.
* rts: Introduce highMemDynamicGHC GitLab CI2020-11-111-1/+8
|
* Merge branch 'wip/tsan/stats' into wip/tsan/allBen Gamari2020-11-011-1/+1
|\
| * rts: Tear down stats_mutex after exitHeapProfilingwip/tsan/statsBen Gamari2020-11-011-1/+1
| | | | | | | | Since the latter wants to call getRTSStats.
* | rts/Storage: Accept races on heap size countersBen Gamari2020-10-301-5/+8
| |
* | rts/Storage: Use atomicsBen Gamari2020-10-241-18/+17
|/
* RTS: avoid overflow on 32-bit arch (#18375)Sylvain Henry2020-06-251-2/+2
| | | | | | | | | | | We're now correctly computing allocated bytes on 32-bit arch, so we get huge increases. Metric Increase: haddock.Cabal haddock.base haddock.compiler space_leak_001
* Cleanup OVERWRITING_CLOSURE logicDaniel Gröber2020-06-011-1/+1
| | | | | | | | | | | The code is just more confusing than it needs to be. We don't need to mix the threaded check with the ldv profiling check since ldv's init already checks for this. Hence they can be two separate checks. Taking the sanity checking into account is also cleaner via DebugFlags.sanity. No need for checking the DEBUG define. The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the old code had also make things a lot more opaque IMO so I removed those.
* nonmoving: Fix handling of dirty objectsBen Gamari2020-05-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we (incorrectly) relied on failed_to_evac to be "precise". That is, we expected it to only be true if *all* of an object's fields lived outside of the non-moving heap. However, does not match the behavior of failed_to_evac, which is true if *any* of the object's fields weren't promoted (meaning that some others *may* live in the non-moving heap). This is problematic as we skip the non-moving write barrier for dirty objects (which we can only safely do if *all* fields point outside of the non-moving heap). Clearly this arises due to a fundamental difference in the behavior expected of failed_to_evac in the moving and non-moving collector. e.g., in the moving collector it is always safe to conservatively say failed_to_evac=true whereas in the non-moving collector the safe value is false. This issue went unnoticed as I never wrote down the dirtiness invariant enforced by the non-moving collector. We now define this invariant as An object being marked as dirty implies that all of its fields are on the mark queue (or, equivalently, update remembered set). To maintain this invariant we teach nonmovingScavengeOne to push the fields of objects which we fail to evacuate to the update remembered set. This is a simple and reasonably cheap solution and avoids the complexity and fragility that other, more strict alternative invariants would require. All of this is described in a new Note, Note [Dirty flags in the non-moving collector] in NonMoving.c.
* rts: Underline some Notes as is conventionalDaniel Gröber2020-04-141-0/+1
|
* rts: allocatePinned: Fix confusion about word/byte unitsDaniel Gröber2020-04-141-19/+22
|
* rts: Expand and add more notes regarding slopDaniel Gröber2020-04-141-0/+49
|
* Zero out pinned block alignment slop when profilingDaniel Gröber2020-04-141-5/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The heap profiler currently cannot traverse pinned blocks because of alignment slop. This used to just be a minor annoyance as the whole block is accounted into a special cost center rather than the respective object's CCS, cf. #7275. However for the new root profiler we would like to be able to visit _every_ closure on the heap. We need to do this so we can get rid of the current 'flip' bit hack in the heap traversal code. Since info pointers are always non-zero we can in principle skip all the slop in the profiler if we can rely on it being zeroed. This assumption caused problems in the past though, commit a586b33f8e ("rts: Correct handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use the same trick for BF_LARGE objects but neglected to take into account that shrink*Array# functions don't ensure that slop is zeroed when not compiling with profiling. Later, commit 0c114c6599 ("Handle large ARR_WORDS in heap census (fix as we will only be assuming slop is zeroed when profiling is on. This commit also reduces the ammount of slop we introduce in the first place by calculating the needed alignment before doing the allocation for small objects where we know the next available address. For large objects we don't know how much alignment we'll have to do yet since those details are hidden behind the allocateMightFail function so there we continue to allocate the maximum additional words we'll need to do the alignment. So we don't have to duplicate all this logic in the cmm code we pull it into the RTS allocatePinned function instead. Metric Decrease: T7257 haddock.Cabal haddock.base
* Fix rts allocateExec() on NetBSDPHO2020-01-251-2/+3
| | | | | | Similar to SELinux, NetBSD "PaX mprotect" prohibits marking a page mapping both writable and executable at the same time. Use libffi which knows how to work around it.
* Fix comment typosGabor Greif2019-12-091-1/+1
| | | | | | | | | | | | | | | | The below is only necessary to fix the CI perf fluke that happened in 9897e8c8ef0b19a9571ef97a1d9bb050c1ee9121: ------------------------- Metric Decrease: T5837 T6048 T9020 T12425 T12234 T13035 T12150 Naperian -------------------------
* Fix more typosBrian Wignall2019-12-021-1/+1
|
* Nonmoving: Allow aging and refactor static objects logicBen Gamari2019-10-221-4/+66
| | | | | | | This commit does two things: * Allow aging of objects during the preparatory minor GC * Refactor handling of static objects to avoid the use of a hashtable
* Nonmoving: Ensure write barrier vanishes in non-threaded RTSBen Gamari2019-10-211-8/+12
|
* Don't cleanup until we've stopped the collectorBen Gamari2019-10-201-0/+1
| | | | | | | This requires that we break nonmovingExit into two pieces since we need to first stop the collector to relinquish any capabilities, then we need to shutdown the scheduler, then we need to free the nonmoving allocators.
* rts: Implement concurrent collection in the nonmoving collectorBen Gamari2019-10-201-7/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the *first* mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
* rts: Non-concurrent mark and sweepÖmer Sinan Ağacan2019-10-201-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the core heap structure and a serial mark/sweep collector which can be used to manage the oldest-generation heap. This is the first step towards a concurrent mark-and-sweep collector aimed at low-latency applications. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) The basic heap structure used in this design is heavily inspired by K. Ueno & A. Ohori. "A fully concurrent garbage collector for functional programs on multicore processors." /ACM SIGPLAN Notices/ Vol. 51. No. 9 (presented by ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). The mark queue is a fairly straightforward chunked-array structure. The representation is a bit more verbose than a typical mark queue to accomodate a combination of two features: * a mark FIFO, which improves the locality of marking, reducing one of the major overheads seen in mark/sweep allocators (see [1] for details) * the selector optimization and indirection shortcutting, which requires that we track where we found each reference to an object in case we need to update the reference at a later point (e.g. when we find that it is an indirection). See Note [Origin references in the nonmoving collector] (in `NonMovingMark.h`) for details. Beyond this the mark/sweep is fairly run-of-the-mill. [1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for Mark-Sweep Garbage Collection." ISMM 2007. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* rts: Give stack flags proper macrosBen Gamari2019-10-181-2/+2
| | | | | This were previously quite unclear and will change a bit under the non-moving collector so let's clear this up now.
* Simplify Configure in a few waysJohn Ericson2019-10-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - No need to distinguish between gcc-llvm and clang. First of all, gcc-llvm is quite old and surely unmaintained by now. Second of all, none of the code actually care about that distinction! Now, it does make sense to consider C multiple frontends for LLVMs in the form of clang vs clang-cl (same clang, yes, but tweaked interface). But this is better handled in terms of "gccish vs mvscish" and "is LLVM", yielding 4 combinations. Therefore, I don't think it is useful saving the existing code for that. - Get the remaining CC_LLVM_BACKEND, and also TABLES_NEXT_TO_CODE in mk/config.h the normal way, rather than hacking it post-hoc. No point keeping these special cases around for now reason. - Get rid of hand-rolled `die` function and just use `AC_MSG_ERROR`. - Abstract check + flag override for unregisterised and tables next to code. Oh, and as part of the above I also renamed/combined some variables where it felt appropriate. - GccIsClang -> CcLlvmBackend. This is for `AC_SUBST`, like the other Camal case ones. It was never about gcc-llvm, or Apple's renamed clang, to be clear. - llvm_CC_FLAVOR -> CC_LLVM_BACKEND. This is for `AC_DEFINE`, like the other all-caps snake case ones. llvm_CC_FLAVOR was just silly indirection *and* an odd name to boot.
* Add new debug flag -DZTobias Guggenmos2019-10-031-1/+1
| | | | Zeros heap memory after gc freed it.
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-281-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Fix GCC warnings with __clear_cache builtin (#16867)Sylvain Henry2019-06-271-6/+8
|
* Improve performance of newSmallArray#Michal Terepeta2019-04-011-2/+2
| | | | | | | | | | | | | | This: - Hoists part of the condition outside of the initialization loop in `stg_newSmallArrayzh`. - Annotates one of the unlikely branches as unlikely, also in `stg_newSmallArrayzh`. - Adds a couple of annotations to `allocateMightFail` indicating which branches are likely to be taken. Together this gives about 5% improvement. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-151-1/+1
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.