summaryrefslogtreecommitdiff
path: root/rts/StgMiscClosures.cmm
Commit message (Collapse)AuthorAgeFilesLines
* Fix a few Note inconsistenciesBen Gamari2022-02-011-4/+4
|
* Introduce stack snapshotting / cloning (#18741)Sven Tennie2021-09-231-0/+3
| | | | | | | | | | | | | | Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack). The cloning interface consists of two functions, that clone either the treads own stack (cloneMyStack) or another threads stack (cloneThreadStack). The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is useful for analyses as it prevents concurrent modifications. For technical details, please see Note [Stack Cloning]. Co-authored-by: Ben Gamari <bgamari.foss@gmail.com> Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
* Work around LLVM backend overlapping register limitationsLuite Stegeman2021-05-291-83/+17
| | | | | | | | The stg_ctoi_t and stg_ret_t procedures which convert unboxed tuples between the bytecode an native calling convention were causing a panic when using the LLVM backend. Fixes #19591
* Generate GHCi bytecode from STG instead of Core and support unboxedLuite Stegeman2021-03-201-0/+268
| | | | | | tuples and sums. fixes #1257
* winio: Use SlimReaderLocks and ConditonalVariables provided by the OS ↵Tamar Christina2020-07-151-2/+2
| | | | instead of emulated ones
* Cmm: introduce SAVE_REGS/RESTORE_REGSSylvain Henry2020-06-231-2/+2
| | | | | | | | | | | | | We don't want to save both Fn and Dn register sets on x86-64 as they are aliased to the same arch register (XMMn). Moreover, when SAVE_STGREGS was used in conjunction with `jump foo [*]` which makes a set of Cmm registers alive so that they cover all arch registers used to pass parameter, we could have Fn, Dn and XMMn alive at the same time. It made the LLVM code generator choke (see #17920). Now `SAVE_REGS/RESTORE_REGS` and `jump foo [*]` use the same set of registers.
* Clean up file paths for new module hierarchyTakenobu Tani2020-06-011-1/+1
| | | | | | | | | This updates comments only. This patch replaces file references according to new module hierarchy. See also: * https://gitlab.haskell.org/ghc/ghc/-/wikis/Make-GHC-codebase-more-modular * https://gitlab.haskell.org/ghc/ghc/issues/13009
* Fix CNF handling in compacting GCÖmer Sinan Ağacan2020-04-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #17937 Previously compacting GC simply ignored CNFs. This is mostly fine as most (see "What about small compacts?" below) CNF objects don't have outgoing pointers, and are "large" (allocated in large blocks) and large objects are not moved or compacted. However if we do GC *during* sharing-preserving compaction then the CNF will have a hash table mapping objects that have been moved to the CNF to their location in the CNF, to be able to preserve sharing. This case is handled in the copying collector, in `scavenge_compact`, where we evacuate hash table entries and then rehash the table. Compacting GC ignored this case. We now visit CNFs in all generations when threading pointers to the compacted heap and thread hash table keys. A visited CNF is added to the list `nfdata_chain`. After compaction is done, we re-visit the CNFs in that list and rehash the tables. The overhead is minimal: the list is static in `Compact.c`, and link field is added to `StgCompactNFData` closure. Programs that don't use CNFs should not be affected. To test this CNF tests are now also run in a new way 'compacting_gc', which just passes `-c` to the RTS, enabling compacting GC for the oldest generation. Before this patch the result would be: Unexpected failures: compact_gc.run compact_gc [bad exit code (139)] (compacting_gc) compact_huge_array.run compact_huge_array [bad exit code (1)] (compacting_gc) With this patch all tests pass. I can also pass `-c -DS` without any failures. What about small compacts? Small CNFs are still not handled by the compacting GC. However so far I'm unable to write a test that triggers a runtime panic ("update_fwd: unknown/strange object") by allocating a small CNF in a compated heap. It's possible that I'm missing something and it's not possible to have a small CNF. NoFib Results: -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS +0.1% 0.0% 0.0% +0.0% +0.0% CSD +0.1% 0.0% 0.0% 0.0% 0.0% FS +0.1% 0.0% 0.0% 0.0% 0.0% S +0.1% 0.0% 0.0% 0.0% 0.0% VS +0.1% 0.0% 0.0% 0.0% 0.0% VSD +0.1% 0.0% +0.0% +0.0% -0.0% VSM +0.1% 0.0% +0.0% -0.0% 0.0% anna +0.0% 0.0% -0.0% -0.0% -0.0% ansi +0.1% 0.0% +0.0% +0.0% +0.0% atom +0.1% 0.0% +0.0% +0.0% +0.0% awards +0.1% 0.0% +0.0% +0.0% +0.0% banner +0.1% 0.0% +0.0% +0.0% +0.0% bernouilli +0.1% 0.0% 0.0% -0.0% +0.0% binary-trees +0.1% 0.0% -0.0% -0.0% 0.0% boyer +0.1% 0.0% +0.0% +0.0% +0.0% boyer2 +0.1% 0.0% +0.0% +0.0% +0.0% bspt +0.1% 0.0% -0.0% -0.0% -0.0% cacheprof +0.1% 0.0% -0.0% -0.0% -0.0% calendar +0.1% 0.0% +0.0% +0.0% +0.0% cichelli +0.1% 0.0% +0.0% +0.0% +0.0% circsim +0.1% 0.0% +0.0% +0.0% +0.0% clausify +0.1% 0.0% -0.0% +0.0% +0.0% comp_lab_zift +0.1% 0.0% +0.0% +0.0% +0.0% compress +0.1% 0.0% +0.0% +0.0% 0.0% compress2 +0.1% 0.0% -0.0% 0.0% 0.0% constraints +0.1% 0.0% +0.0% +0.0% +0.0% cryptarithm1 +0.1% 0.0% +0.0% +0.0% +0.0% cryptarithm2 +0.1% 0.0% +0.0% +0.0% +0.0% cse +0.1% 0.0% +0.0% +0.0% +0.0% digits-of-e1 +0.1% 0.0% +0.0% -0.0% -0.0% digits-of-e2 +0.1% 0.0% -0.0% -0.0% -0.0% dom-lt +0.1% 0.0% +0.0% +0.0% +0.0% eliza +0.1% 0.0% +0.0% +0.0% +0.0% event +0.1% 0.0% +0.0% +0.0% +0.0% exact-reals +0.1% 0.0% +0.0% +0.0% +0.0% exp3_8 +0.1% 0.0% +0.0% -0.0% 0.0% expert +0.1% 0.0% +0.0% +0.0% +0.0% fannkuch-redux +0.1% 0.0% -0.0% 0.0% 0.0% fasta +0.1% 0.0% -0.0% +0.0% +0.0% fem +0.1% 0.0% -0.0% +0.0% 0.0% fft +0.1% 0.0% -0.0% +0.0% +0.0% fft2 +0.1% 0.0% +0.0% +0.0% +0.0% fibheaps +0.1% 0.0% +0.0% +0.0% +0.0% fish +0.1% 0.0% +0.0% +0.0% +0.0% fluid +0.0% 0.0% +0.0% +0.0% +0.0% fulsom +0.1% 0.0% -0.0% +0.0% 0.0% gamteb +0.1% 0.0% +0.0% +0.0% 0.0% gcd +0.1% 0.0% +0.0% +0.0% +0.0% gen_regexps +0.1% 0.0% -0.0% +0.0% 0.0% genfft +0.1% 0.0% +0.0% +0.0% +0.0% gg +0.1% 0.0% 0.0% +0.0% +0.0% grep +0.1% 0.0% -0.0% +0.0% +0.0% hidden +0.1% 0.0% +0.0% -0.0% 0.0% hpg +0.1% 0.0% -0.0% -0.0% -0.0% ida +0.1% 0.0% +0.0% +0.0% +0.0% infer +0.1% 0.0% +0.0% 0.0% -0.0% integer +0.1% 0.0% +0.0% +0.0% +0.0% integrate +0.1% 0.0% -0.0% -0.0% -0.0% k-nucleotide +0.1% 0.0% +0.0% +0.0% 0.0% kahan +0.1% 0.0% +0.0% +0.0% +0.0% knights +0.1% 0.0% -0.0% -0.0% -0.0% lambda +0.1% 0.0% +0.0% +0.0% -0.0% last-piece +0.1% 0.0% +0.0% 0.0% 0.0% lcss +0.1% 0.0% +0.0% +0.0% 0.0% life +0.1% 0.0% -0.0% +0.0% +0.0% lift +0.1% 0.0% +0.0% +0.0% +0.0% linear +0.1% 0.0% -0.0% +0.0% 0.0% listcompr +0.1% 0.0% +0.0% +0.0% +0.0% listcopy +0.1% 0.0% +0.0% +0.0% +0.0% maillist +0.1% 0.0% +0.0% -0.0% -0.0% mandel +0.1% 0.0% +0.0% +0.0% 0.0% mandel2 +0.1% 0.0% +0.0% +0.0% +0.0% mate +0.1% 0.0% +0.0% 0.0% +0.0% minimax +0.1% 0.0% -0.0% 0.0% -0.0% mkhprog +0.1% 0.0% +0.0% +0.0% +0.0% multiplier +0.1% 0.0% +0.0% 0.0% 0.0% n-body +0.1% 0.0% +0.0% +0.0% +0.0% nucleic2 +0.1% 0.0% +0.0% +0.0% +0.0% para +0.1% 0.0% 0.0% +0.0% +0.0% paraffins +0.1% 0.0% +0.0% -0.0% 0.0% parser +0.1% 0.0% -0.0% -0.0% -0.0% parstof +0.1% 0.0% +0.0% +0.0% +0.0% pic +0.1% 0.0% -0.0% -0.0% 0.0% pidigits +0.1% 0.0% +0.0% -0.0% -0.0% power +0.1% 0.0% +0.0% +0.0% +0.0% pretty +0.1% 0.0% -0.0% -0.0% -0.1% primes +0.1% 0.0% -0.0% -0.0% -0.0% primetest +0.1% 0.0% -0.0% -0.0% -0.0% prolog +0.1% 0.0% -0.0% -0.0% -0.0% puzzle +0.1% 0.0% -0.0% -0.0% -0.0% queens +0.1% 0.0% +0.0% +0.0% +0.0% reptile +0.1% 0.0% -0.0% -0.0% +0.0% reverse-complem +0.1% 0.0% +0.0% 0.0% -0.0% rewrite +0.1% 0.0% -0.0% -0.0% -0.0% rfib +0.1% 0.0% +0.0% +0.0% +0.0% rsa +0.1% 0.0% -0.0% +0.0% -0.0% scc +0.1% 0.0% -0.0% -0.0% -0.1% sched +0.1% 0.0% +0.0% +0.0% +0.0% scs +0.1% 0.0% +0.0% +0.0% +0.0% simple +0.1% 0.0% -0.0% -0.0% -0.0% solid +0.1% 0.0% +0.0% +0.0% +0.0% sorting +0.1% 0.0% -0.0% -0.0% -0.0% spectral-norm +0.1% 0.0% +0.0% +0.0% +0.0% sphere +0.1% 0.0% -0.0% -0.0% -0.0% symalg +0.1% 0.0% -0.0% -0.0% -0.0% tak +0.1% 0.0% +0.0% +0.0% +0.0% transform +0.1% 0.0% +0.0% +0.0% +0.0% treejoin +0.1% 0.0% +0.0% -0.0% -0.0% typecheck +0.1% 0.0% +0.0% +0.0% +0.0% veritas +0.0% 0.0% +0.0% +0.0% +0.0% wang +0.1% 0.0% 0.0% +0.0% +0.0% wave4main +0.1% 0.0% +0.0% +0.0% +0.0% wheel-sieve1 +0.1% 0.0% +0.0% +0.0% +0.0% wheel-sieve2 +0.1% 0.0% +0.0% +0.0% +0.0% x2n1 +0.1% 0.0% +0.0% +0.0% +0.0% -------------------------------------------------------------------------------- Min +0.0% 0.0% -0.0% -0.0% -0.1% Max +0.1% 0.0% +0.0% +0.0% +0.0% Geometric Mean +0.1% -0.0% -0.0% -0.0% -0.0% Bumping numbers of nonsensical perf tests: Metric Increase: T12150 T12234 T12425 T13035 T5837 T6048 It's simply not possible for this patch to increase allocations, and I've wasted enough time on these test in the past (see #17686). I think these tests should not be perf tests, but for now I'll bump the numbers.
* Improve and refactor StgToCmm codegen for DataCons.Andreas Klebinger2020-04-031-1/+1
| | | | | | | | | | | | | | | | We now differentiate three cases of constructor bindings: 1)Bindings which we can "replace" with a reference to an existing closure. Reference the replacement closure when accessing the binding. 2)Bindings which we can "replace" as above. But we still generate a closure which will be referenced by modules importing this binding. 3)For any other binding generate a closure. Then reference it. Before this patch 1) did only apply to local bindings and we didn't do 2) at all.
* Remove unused closure stg_IND_directÖmer Sinan Ağacan2020-04-021-9/+0
|
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-1/+1
|
* Fix COMPACT_NFDATA closure size, more CNF sanity checkingÖmer Sinan Ağacan2019-08-181-2/+2
| | | | | | We now do a shallow closure check on objects in compact regions. See the new comment on why we can't do a "normal" closure check.
* Expand the preallocated Int range to [-16,255]Andreas Klebinger2019-07-301-1/+241
| | | | | | | | | | | | | | | | Effects as I measured them: RTS Size: +0.1% Compile times: -0.5% Runtine nofib: -1.1% Nofib runtime result seems to mostly come from the `CS` benchmark which is very sensible to alignment changes so this is likely over represented. However the compile time changes are realistic. This is related to #16961.
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-281-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Mark SRT_1/SRT_2 as CONSTR_1_0/CONSTR_2_0Ömer Sinan Ağacan2018-12-071-2/+2
| | | | | | | | | | | | | | | Test Plan: Validated locally, but skipped perf tests as there's a framework-related error there. Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5421
* Refactor info table entry error messagesÖmer Sinan Ağacan2018-09-121-57/+57
| | | | | | | | | | | | | | | | | | | | We now show address of the entered object in error messages. Example: foo: internal error: Evaluated a CAF (0xe4c518) that was GC'd! (GHC version 8.6.0.20180907 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug Helpful when debugging. Test Plan: This validates Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5143
* Various RTS bug fixes:Ömer Sinan Ağacan2018-09-071-1/+1
| | | | | | | | | | | | | | | | | - Retainer profiler: init_srt_thunk() should mark the stack entry as SRT - Retainer profiler: Remove an incorrect assertion about FUN_STATIC. FUN_STATIC does not have to have an SRT. - Fix nptrs of BCO Test Plan: validate Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5134
* Rename some mutable closure types for consistencyÖmer Sinan Ağacan2018-06-051-8/+8
| | | | | | | | | | | | | | | | | | | | | | | SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR, MUT_ARR_PTRS). (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR) Removed a few comments in Scav.c about FROZEN0 being on the mut_list because it's now clear from the closure type. Reviewers: bgamari, simonmar, erikd Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4784
* rts: Rip out support for STM invariantsBen Gamari2018-06-021-12/+1
| | | | | | | | | | | | | | | | | | | | | | | This feature has some very serious correctness issues (#14310), introduces a great deal of complexity, and hasn't seen wide usage. Consequently we are removing it, as proposed in Proposal #77 [1]. This is heavily based on a patch from fryguybob. Updates stm submodule. [1] https://github.com/ghc-proposals/ghc-proposals/pull/77 Test Plan: Validate Reviewers: erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14310 Differential Revision: https://phabricator.haskell.org/D4760
* An overhaul of the SRT representationSimon Marlow2018-05-161-1/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Previously we would hvae a single big table of pointers per module, with a set of bitmaps to reference entries within it. The new representation is identical to a static constructor, which is much simpler for the GC to traverse, and we get to remove the complicated bitmap-traversal code from the GC. - Rewrite all the code to generate SRTs in CmmBuildInfoTables, and document it much better (see Note [SRTs]). This has been something I've wanted to do since we moved to the new code generator, I finally had the opportunity to finish it while on a transatlantic flight recently :) There are a series of 4 diffs: 1. D4632 (this one), which does the bulk of the changes 2. D4633 which adds support for smaller `CmmLabelDiffOff` constants 3. D4634 which takes advantage of D4632 and D4633 to save a word in info tables that have an SRT on x86_64. This is where most of the binary size improvement comes from. 4. D4637 which makes a further optimisation to merge some SRTs with static FUN closures. This adds some complexity and the benefits are fairly modest, so it's not clear yet whether we should do this. Results (after (3), on x86_64) - GHC itself (staticaly linked) is 5.2% smaller - -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176 - I measured the overhead of traversing all the static objects in a major GC in GHC itself by doing `replicateM_ 1000 performGC` as the first thing in `Main.main`. The new version was 5-10% faster, but the results did vary quite a bit. - I'm not sure if there's a compile-time difference, the results are too unreliable. Test Plan: validate Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4632
* Add Note [BLACKHOLE points to IND]Simon Marlow2018-03-251-0/+1
| | | | | | | | | | | | Test Plan: ci Reviewers: osa1, bgamari, erikd Reviewed By: osa1 Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4517
* rts: Add --internal-counters RTS flag and several countersDouglas Wilson2018-03-191-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing internal counters: * gc_alloc_block_sync * whitehole_spin * gen[g].sync * gen[1].sync are now not shown in the -s report unless --internal-counters is also passed. If --internal-counters is passed we now show the counters above, reformatted, as well as several other counters. In particular, we now count the yieldThread() calls that SpinLocks do as well as their spins. The added counters are: * gc_spin (spin and yield) * mut_spin (spin and yield) * whitehole_threadPaused (spin only) * whitehole_executeMessage (spin only) * whitehole_lockClosure (spin only) * waitForGcThreadsd (spin and yield) As well as the following, which are not SpinLock-like things: * any_work * do_work * scav_find_work See the Note for descriptions of what these counters are. We add busy_wait_nops in these loops along with the counter increment where it was absent. Old internal counters output: ``` gc_alloc_block_sync: 0 whitehole_gc_spin: 0 gen[0].sync: 0 gen[1].sync: 0 ``` New internal counters output: ``` Internal Counters: Spins Yields gc_alloc_block_sync 323 0 gc_spin 9016713 752 mut_spin 57360944 47716 whitehole_gc 0 n/a whitehole_threadPaused 0 n/a whitehole_executeMessage 0 n/a whitehole_lockClosure 0 0 waitForGcThreads 2 415 gen[0].sync 6 0 gen[1].sync 1 0 any_work 2017 no_work 2014 scav_find_work 1004 ``` Test Plan: ./validate Check it builds with #define PROF_SPIN removed from includes/rts/Config.h Reviewers: bgamari, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #3553, #9221 Differential Revision: https://phabricator.haskell.org/D4302
* rts: Set unwind information for remaining stack framesBen Gamari2017-09-211-1/+2
| | | | | | | | | | Reviewers: austin, erikd, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3985
* Cmm: Add support for undefined unwinding statementsBen Gamari2017-02-081-0/+2
| | | | | | | | | | | | | | And use to mark `stg_stack_underflow_frame`, which we are unable to determine a caller from. To simplify parsing at the moment we steal the `return` keyword to indicate an undefined unwind value. Perhaps this should be revisited. Reviewers: scpmw, simonmar, austin, erikd Subscribers: dfeuer, thomie Differential Revision: https://phabricator.haskell.org/D2738
* More fixes for #5654Simon Marlow2017-01-061-0/+10
| | | | | | | | | | | | | | * In stg_ap_0_fast, if we're evaluating a thunk, the thunk might evaluate to a function in which case we may have to adjust its CCS. * The interpreter has its own implementation of stg_ap_0_fast, so we have to do the same shenanigans with creating empty PAPs and copying PAPs there. * GHCi creates Cost Centres as children of CCS_MAIN, which enterFunCCS() wrongly assumed to imply that they were CAFs. Now we use the is_caf flag for this, which we have to correctly initialise when we create a Cost Centre in GHCi.
* Overhaul of Compact Regions (#12455)Simon Marlow2016-12-071-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit makes various improvements and addresses some issues with Compact Regions (aka Compact Normal Forms). This was the most important thing I wanted to fix. Compaction previously prevented GC from running until it was complete, which would be a problem in a multicore setting. Now, we compact using a hand-written Cmm routine that can be interrupted at any point. When a GC is triggered during a sharing-enabled compaction, the GC has to traverse and update the hash table, so this hash table is now stored in the StgCompactNFData object. Previously, compaction consisted of a deepseq using the NFData class, followed by a traversal in C code to copy the data. This is now done in a single pass with hand-written Cmm (see rts/Compact.cmm). We no longer use the NFData instances, instead the Cmm routine evaluates components directly as it compacts. The new compaction is about 50% faster than the old one with no sharing, and a little faster on average with sharing (the cost of the hash table dominates when we're doing sharing). Static objects that don't (transitively) refer to any CAFs don't need to be copied into the compact region. In particular this means we often avoid copying Char values and small Int values, because these are static closures in the runtime. Each Compact# object can support a single compactAdd# operation at any given time, so the Data.Compact library now enforces mutual exclusion using an MVar stored in the Compact object. We now get exceptions rather than killing everything with a barf() when we encounter an object that cannot be compacted (a function, or a mutable object). We now also detect pinned objects, which can't be compacted either. The Data.Compact API has been refactored and cleaned up. A new compactSize operation returns the size (in bytes) of the compact object. Most of the documentation is in the Haddock docs for the compact library, which I've expanded and improved here. Various comments in the code have been improved, especially the main Note [Compact Normal Forms] in rts/sm/CNF.c. I've added a few tests, and expanded a few of the tests that were there. We now also run the tests with GHCi, and in a new test way that enables sanity checking (+RTS -DS). There's a benchmark in libraries/compact/tests/compact_bench.hs for measuring compaction speed and comparing sharing vs. no sharing. The field totalDataW in StgCompactNFData was unnecessary. Test Plan: * new unit tests * validate * tested manually that we can compact Data.Aeson data Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd Subscribers: thomie, simonpj Differential Revision: https://phabricator.haskell.org/D2751 GHC Trac Issues: #12455
* Remove CONSTR_STATICSimon Marlow2016-11-141-21/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We currently have two info tables for a constructor * XXX_con_info: the info table for a heap-resident instance of the constructor, It has type CONSTR, or one of the specialised types like CONSTR_1_0 * XXX_static_info: the info table for a static instance of this constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF. I'm getting rid of the latter, and using the `con_info` info table for both static and dynamic constructors. For rationale and more details see Note [static constructors] in SMRep.hs. I also removed these macros: `isSTATIC()`, `ip_STATIC()`, `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC distinction, and anyway HEAP_ALLOCED() does the same job. Test Plan: validate Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2690 GHC Trac Issues: #12455
* Tag pointers in interpreted constructorsmniip2016-08-301-5/+8
| | | | | | | | | | | | | | | | | | | | | Instead of stg_interp_constr_entry there are now 7 functions (one for each value of the tag bits) that tag the constructor pointer before returning. This is consistent with compiled constructors' entry code, and expectations that compiled code places on compiled constructors. The iserv protocol is extended with an extra field that explains what pointer tag the constructor should use. Test Plan: Added tests for #12523 Reviewers: erikd, bgamari, hvr, austin, simonmar Reviewed By: simonmar Subscribers: osa1, thomie, rwbarton Differential Revision: https://phabricator.haskell.org/D2473 GHC Trac Issues: #12523
* Remove StgRubbishArg and CmmArgÖmer Sinan Ağacan2016-08-101-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea behind adding special "rubbish" arguments was in unboxed sum types depending on the tag some arguments are not used and we don't want to move some special values (like 0 for literals and some special pointer for boxed slots) for those arguments (to stack locations or registers). "StgRubbishArg" was an indicator to the code generator that the value won't be used. During Stg-to-Cmm we were then not generating any move or store instructions at all. This caused problems in the register allocator because some variables were only initialized in some code paths. As an example, suppose we have this STG: (after unarise) Lib.$WT = \r [dt_sit] case case dt_sit of { Lib.F dt_siv [Occ=Once] -> (#,,#) [1# dt_siv StgRubbishArg::GHC.Prim.Int#]; Lib.I dt_siw [Occ=Once] -> (#,,#) [2# StgRubbishArg::GHC.Types.Any dt_siw]; } of dt_six { (#,,#) us_giC us_giD us_giE -> Lib.T [us_giC us_giD us_giE]; }; This basically unpacks a sum type to an unboxed sum with 3 fields, and then moves the unboxed sum to a constructor (`Lib.T`). This is the Cmm for the inner case expression (case expression in the scrutinee position of the outer case): ciN: ... -- look at dt_sit's tag if (_ciT::P64 != 1) goto ciS; else goto ciR; ciS: -- Tag is 2, i.e. Lib.F _siw::I64 = I64[_siu::P64 + 6]; _giE::I64 = _siw::I64; _giD::P64 = stg_RUBBISH_ENTRY_info; _giC::I64 = 2; goto ciU; ciR: -- Tag is 1, i.e. Lib.I _siv::P64 = P64[_siu::P64 + 7]; _giD::P64 = _siv::P64; _giC::I64 = 1; goto ciU; Here one of the blocks `ciS` and `ciR` is executed and then the execution continues to `ciR`, but only `ciS` initializes `_giE`, in the other branch `_giE` is not initialized, because it's "rubbish" in the STG and so we don't generate an assignment during code generator. The code generator then panics during the register allocations: ghc-stage1: panic! (the 'impossible' happened) (GHC version 8.1.20160722 for x86_64-unknown-linux): LocalReg's live-in to graph ciY {_giE::I64} (`_giD` is also "rubbish" in `ciS`, but it's still initialized because it's a pointer slot, we have to initialize it otherwise garbage collector follows the pointer to some random place. So we only remove assignment if the "rubbish" arg has unboxed type.) This patch removes `StgRubbishArg` and `CmmArg`. We now always initialize rubbish slots. If the slot is for boxed types we use the existing `absentError`, otherwise we initialize the slot with literal 0. Reviewers: simonpj, erikd, austin, simonmar, bgamari Reviewed By: erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2446
* Implement unboxed sum primitive typeÖmer Sinan Ağacan2016-07-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements primitive unboxed sum types, as described in https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes. Main changes are: - Add new syntax for unboxed sums types, terms and patterns. Hidden behind `-XUnboxedSums`. - Add unlifted unboxed sum type constructors and data constructors, extend type and pattern checkers and desugarer. - Add new RuntimeRep for unboxed sums. - Extend unarise pass to translate unboxed sums to unboxed tuples right before code generation. - Add `StgRubbishArg` to `StgArg`, and a new type `CmmArg` for better code generation when sum values are involved. - Add user manual section for unboxed sums. Some other changes: - Generalize `UbxTupleRep` to `MultiRep` and `UbxTupAlt` to `MultiValAlt` to be able to use those with both sums and tuples. - Don't use `tyConPrimRep` in `isVoidTy`: `tyConPrimRep` is really wrong, given an `Any` `TyCon`, there's no way to tell what its kind is, but `kindPrimRep` and in turn `tyConPrimRep` returns `PtrRep`. - Fix some bugs on the way: #12375. Not included in this patch: - Update Haddock for new the new unboxed sum syntax. - `TemplateHaskell` support is left as future work. For reviewers: - Front-end code is mostly trivial and adapted from unboxed tuple code for type checking, pattern checking, renaming, desugaring etc. - Main translation routines are in `RepType` and `UnariseStg`. Documentation in `UnariseStg` should be enough for understanding what's going on. Credits: - Johan Tibell wrote the initial front-end and interface file extensions. - Simon Peyton Jones reviewed this patch many times, wrote some code, and helped with debugging. Reviewers: bgamari, alanz, goldfire, RyanGlScott, simonpj, austin, simonmar, hvr, erikd Reviewed By: simonpj Subscribers: Iceland_jack, ggreif, ezyang, RyanGlScott, goldfire, thomie, mpickering Differential Revision: https://phabricator.haskell.org/D2259
* Compact RegionsGiovanni Campagna2016-07-201-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | This brings in initial support for compact regions, as described in the ICFP 2015 paper "Efficient Communication and Collection with Compact Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni Campagna. Some things may change before the 8.2 release, but I (Simon M.) wanted to get the main patch committed so that we can iterate. What documentation there is is in the Data.Compact module in the new compact package. We'll need to extend and polish the documentation before the release. Test Plan: validate (new test cases included) Reviewers: ezyang, simonmar, hvr, bgamari, austin Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd Differential Revision: https://phabricator.haskell.org/D1264 GHC Trac Issues: #11493
* rts: fix threadStackUnderflow type in cmmSergei Trofimovich2016-03-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stg_stack_underflow_frame had an incorrect call of C function 'threadStackUnderflow': ("ptr" ret_off) = foreign "C" threadStackUnderflow( MyCapability(), CurrentTSO); Which means it's prototype is: void * (*) (W_, void*); While real prototype is: W_ (*) (Capability *cap, StgTSO *tso); The fix is simple. Fix type annotations: (ret_off) = foreign "C" threadStackUnderflow( MyCapability() "ptr", CurrentTSO "ptr"); Noticed when debugged T9045 test failure on m68k target which distincts between pointer and non pointer return types (uses different registers) While at it noticed and fixed return types for 'throwTo' and 'findSpark'. Trac #11395 Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* Remove unused IND_PERMJoachim Breitner2016-01-231-39/+0
| | | | | | | | | | | | | | | | | it seems that this closure type has not been in use since 5d52d9, so all this is dead and untested code. This removes it. Some of the code might be useful for a counting indirection as described in #10613, so when implementing that, have a look at what this commit removes. Test Plan: validate on harbormaster Reviewers: austin, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1821
* Update comments around blackholesSimon Marlow2015-07-071-7/+9
| | | | | | | | | | Test Plan: validate Reviewers: austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1047
* Add unwind information to CmmPeter Wortmann2014-12-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Unwind information allows the debugger to discover more information about a program state, by allowing it to "reconstruct" other states of the program. In practice, this means that we explain to the debugger how to unravel stack frames, which comes down mostly to explaining how to find their Sp and Ip register values. * We declare yet another new constructor for CmmNode - and this time there's actually little choice, as unwind information can and will change mid-block. We don't actually make use of these capabilities, and back-end support would be tricky (generate new labels?), but it feels like the right way to do it. * Even though we only use it for Sp so far, we allow CmmUnwind to specify unwind information for any register. This is pretty cheap and could come in useful in future. * We allow full CmmExpr expressions for specifying unwind values. The advantage here is that we don't have to make up new syntax, and can e.g. use the WDS macro directly. On the other hand, the back-end will now have to simplify the expression until it can sensibly be converted into DWARF byte code - a process which might fail, yielding NCG panics. On the other hand, when you're writing Cmm by hand you really ought to know what you're doing. (From Phabricator D169)
* Add a comment about stack checksSimon Marlow2014-11-051-0/+2
|
* Revert "BC-breaking changes to C-- CLOSURE syntax."Edward Z. Yang2014-10-201-10/+10
| | | | This reverts commit 3b5a840bba375c4c4c11ccfeb283f84c3a1ef22c.
* Revert "Rename _closure to _static_closure, apply naming consistently."Edward Z. Yang2014-10-201-2/+2
| | | | | | | This reverts commit 35672072b4091d6f0031417bc160c568f22d0469. Conflicts: compiler/main/DriverPipeline.hs
* Rename _closure to _static_closure, apply naming consistently.Edward Z. Yang2014-10-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
* BC-breaking changes to C-- CLOSURE syntax.Edward Z. Yang2014-10-011-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, there were two variants of CLOSURE in C--: - Top-level CLOSURE(foo_closure, foo, lits...), which defines a new static closure and gives it a name, and - Array CLOSURE(foo, lits...), which was used for the static char and integer arrays. They used the same name, were confusing, and didn't even generate the correct internal label representation! So now, we have two new forms: - Top-level CLOSURE(foo, lits...) which automatically generates foo_closure (along with foo_info, which we were doing already) - Array ANONYMOUS_CLOSURE(foo, lits...) which doesn't generate a foo_closure identifier. Part of remove HEAP_ALLOCED patch set (#8199) Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D264 GHC Trac Issues: #8199
* Add SmallArray# and SmallMutableArray# typesJohan Tibell2014-03-291-0/+12
| | | | | | | | | | | | | | | These array types are smaller than Array# and MutableArray# and are faster when the array size is small, as they don't have the overhead of a card table. Having no card table reduces the closure size with 2 words in the typical small array case and leads to less work when updating or GC:ing the array. Reduces both the runtime and memory allocation by 8.8% on my insert benchmark for the HashMap type in the unordered-containers package, which makes use of lots of small arrays. With tuned GC settings (i.e. `+RTS -A6M`) the runtime reduction is 15%. Fixes #8923.
* Fix formatting (fixes #8551)Arash Rouhani2013-11-221-1/+1
| | | | | Signed-off-by: Arash Rouhani <rarash@student.chalmers.se> Reviewed-by: Austin Seipp <austin@well-typed.com>
* In the DEBUG rts, track when CAFs are GC'dSimon Marlow2013-11-211-0/+7
| | | | | | | | | | | | This resurrects some old code and makes it work again. The idea is that we want to get an error message if we ever enter a CAF that has been GC'd, rather than following its indirection which will likely cause a segfault. Without this patch, these bugs are hard to track down in gdb, because the IND_STATIC code overwrites R1 (the pointer to the CAF) with its indirectee before jumping into bad memory, so we've lost the address of the CAF that got GC'd. Some associated refactoring while I was here.
* Allow multiple C finalizers to be attached to a Weak#Takano Akio2013-06-151-0/+9
| | | | | | | | | | | | | The commit replaces mkWeakForeignEnv# with addCFinalizerToWeak#. This new primop mutates an existing Weak# object and adds a new C finalizer to it. This change removes an invariant in MarkWeak.c, namely that the relative order of Weak# objects in the list needs to be preserved across GC. This makes it easier to split the list into per-generation structures. The patch also removes a race condition between two threads calling finalizeWeak# on the same WEAK object at that same time.
* STM: Only wake up onceBen Gamari2013-01-301-0/+12
| | | | | | | | | | | Previously, threads blocked on an STM retry would be sent a wakeup message each time an unpark was requested. This could result in the accumulation of a large number of wake-up messages, which would slow wake-up once the sleeping thread is finally scheduled. Here, we introduce a new closure type, STM_AWOKEN, which marks a TSO which has been sent a wake-up message, allowing us to send only one wakeup.
* Add a write barrier for TVAR closuresSimon Marlow2012-11-161-2/+5
| | | | | | | | | | This improves GC performance when there are a lot of TVars in the heap. For instance, a TChan with a lot of elements causes a massive GC drag without this patch. There's more to do - several other STM closure types don't have write barriers, so GC performance when there are a lot of threads blocked on STM isn't great. But fixing the problem for TVar is a good start.
* Fix bug in stg_ctoi_R1n (could cause GHCi crashes)Simon Marlow2012-11-011-1/+1
|
* Produce new-style Cmm from the Cmm parserSimon Marlow2012-10-081-45/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
* Working towards fixing DLLs on Win64Ian Lynagh2012-05-061-3/+3
|
* Soem more Wind64 fixesIan Lynagh2012-03-161-3/+3
| | | | | We may need to do this differently once we get as far as building the RTS in the dyn ways.