| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack).
The cloning interface consists of two functions, that clone either the treads
own stack (cloneMyStack) or another threads stack (cloneThreadStack).
The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is
useful for analyses as it prevents concurrent modifications.
For technical details, please see Note [Stack Cloning].
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
|
|
|
|
|
|
|
|
| |
The stg_ctoi_t and stg_ret_t procedures which convert unboxed
tuples between the bytecode an native calling convention were
causing a panic when using the LLVM backend.
Fixes #19591
|
|
|
|
|
|
| |
tuples and sums.
fixes #1257
|
|
|
|
| |
instead of emulated ones
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't want to save both Fn and Dn register sets on x86-64 as they are
aliased to the same arch register (XMMn).
Moreover, when SAVE_STGREGS was used in conjunction with `jump foo [*]`
which makes a set of Cmm registers alive so that they cover all arch
registers used to pass parameter, we could have Fn, Dn and XMMn alive at
the same time. It made the LLVM code generator choke (see #17920).
Now `SAVE_REGS/RESTORE_REGS` and `jump foo [*]` use the same set of
registers.
|
|
|
|
|
|
|
|
|
| |
This updates comments only.
This patch replaces file references according to new module hierarchy.
See also:
* https://gitlab.haskell.org/ghc/ghc/-/wikis/Make-GHC-codebase-more-modular
* https://gitlab.haskell.org/ghc/ghc/issues/13009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #17937
Previously compacting GC simply ignored CNFs. This is mostly fine as
most (see "What about small compacts?" below) CNF objects don't have
outgoing pointers, and are "large" (allocated in large blocks) and large
objects are not moved or compacted.
However if we do GC *during* sharing-preserving compaction then the CNF
will have a hash table mapping objects that have been moved to the CNF
to their location in the CNF, to be able to preserve sharing.
This case is handled in the copying collector, in `scavenge_compact`,
where we evacuate hash table entries and then rehash the table.
Compacting GC ignored this case.
We now visit CNFs in all generations when threading pointers to the
compacted heap and thread hash table keys. A visited CNF is added to the
list `nfdata_chain`. After compaction is done, we re-visit the CNFs in
that list and rehash the tables.
The overhead is minimal: the list is static in `Compact.c`, and link
field is added to `StgCompactNFData` closure. Programs that don't use
CNFs should not be affected.
To test this CNF tests are now also run in a new way 'compacting_gc',
which just passes `-c` to the RTS, enabling compacting GC for the oldest
generation. Before this patch the result would be:
Unexpected failures:
compact_gc.run compact_gc [bad exit code (139)] (compacting_gc)
compact_huge_array.run compact_huge_array [bad exit code (1)] (compacting_gc)
With this patch all tests pass. I can also pass `-c -DS` without any
failures.
What about small compacts? Small CNFs are still not handled by the
compacting GC. However so far I'm unable to write a test that triggers a
runtime panic ("update_fwd: unknown/strange object") by allocating a
small CNF in a compated heap. It's possible that I'm missing something
and it's not possible to have a small CNF.
NoFib Results:
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS +0.1% 0.0% 0.0% +0.0% +0.0%
CSD +0.1% 0.0% 0.0% 0.0% 0.0%
FS +0.1% 0.0% 0.0% 0.0% 0.0%
S +0.1% 0.0% 0.0% 0.0% 0.0%
VS +0.1% 0.0% 0.0% 0.0% 0.0%
VSD +0.1% 0.0% +0.0% +0.0% -0.0%
VSM +0.1% 0.0% +0.0% -0.0% 0.0%
anna +0.0% 0.0% -0.0% -0.0% -0.0%
ansi +0.1% 0.0% +0.0% +0.0% +0.0%
atom +0.1% 0.0% +0.0% +0.0% +0.0%
awards +0.1% 0.0% +0.0% +0.0% +0.0%
banner +0.1% 0.0% +0.0% +0.0% +0.0%
bernouilli +0.1% 0.0% 0.0% -0.0% +0.0%
binary-trees +0.1% 0.0% -0.0% -0.0% 0.0%
boyer +0.1% 0.0% +0.0% +0.0% +0.0%
boyer2 +0.1% 0.0% +0.0% +0.0% +0.0%
bspt +0.1% 0.0% -0.0% -0.0% -0.0%
cacheprof +0.1% 0.0% -0.0% -0.0% -0.0%
calendar +0.1% 0.0% +0.0% +0.0% +0.0%
cichelli +0.1% 0.0% +0.0% +0.0% +0.0%
circsim +0.1% 0.0% +0.0% +0.0% +0.0%
clausify +0.1% 0.0% -0.0% +0.0% +0.0%
comp_lab_zift +0.1% 0.0% +0.0% +0.0% +0.0%
compress +0.1% 0.0% +0.0% +0.0% 0.0%
compress2 +0.1% 0.0% -0.0% 0.0% 0.0%
constraints +0.1% 0.0% +0.0% +0.0% +0.0%
cryptarithm1 +0.1% 0.0% +0.0% +0.0% +0.0%
cryptarithm2 +0.1% 0.0% +0.0% +0.0% +0.0%
cse +0.1% 0.0% +0.0% +0.0% +0.0%
digits-of-e1 +0.1% 0.0% +0.0% -0.0% -0.0%
digits-of-e2 +0.1% 0.0% -0.0% -0.0% -0.0%
dom-lt +0.1% 0.0% +0.0% +0.0% +0.0%
eliza +0.1% 0.0% +0.0% +0.0% +0.0%
event +0.1% 0.0% +0.0% +0.0% +0.0%
exact-reals +0.1% 0.0% +0.0% +0.0% +0.0%
exp3_8 +0.1% 0.0% +0.0% -0.0% 0.0%
expert +0.1% 0.0% +0.0% +0.0% +0.0%
fannkuch-redux +0.1% 0.0% -0.0% 0.0% 0.0%
fasta +0.1% 0.0% -0.0% +0.0% +0.0%
fem +0.1% 0.0% -0.0% +0.0% 0.0%
fft +0.1% 0.0% -0.0% +0.0% +0.0%
fft2 +0.1% 0.0% +0.0% +0.0% +0.0%
fibheaps +0.1% 0.0% +0.0% +0.0% +0.0%
fish +0.1% 0.0% +0.0% +0.0% +0.0%
fluid +0.0% 0.0% +0.0% +0.0% +0.0%
fulsom +0.1% 0.0% -0.0% +0.0% 0.0%
gamteb +0.1% 0.0% +0.0% +0.0% 0.0%
gcd +0.1% 0.0% +0.0% +0.0% +0.0%
gen_regexps +0.1% 0.0% -0.0% +0.0% 0.0%
genfft +0.1% 0.0% +0.0% +0.0% +0.0%
gg +0.1% 0.0% 0.0% +0.0% +0.0%
grep +0.1% 0.0% -0.0% +0.0% +0.0%
hidden +0.1% 0.0% +0.0% -0.0% 0.0%
hpg +0.1% 0.0% -0.0% -0.0% -0.0%
ida +0.1% 0.0% +0.0% +0.0% +0.0%
infer +0.1% 0.0% +0.0% 0.0% -0.0%
integer +0.1% 0.0% +0.0% +0.0% +0.0%
integrate +0.1% 0.0% -0.0% -0.0% -0.0%
k-nucleotide +0.1% 0.0% +0.0% +0.0% 0.0%
kahan +0.1% 0.0% +0.0% +0.0% +0.0%
knights +0.1% 0.0% -0.0% -0.0% -0.0%
lambda +0.1% 0.0% +0.0% +0.0% -0.0%
last-piece +0.1% 0.0% +0.0% 0.0% 0.0%
lcss +0.1% 0.0% +0.0% +0.0% 0.0%
life +0.1% 0.0% -0.0% +0.0% +0.0%
lift +0.1% 0.0% +0.0% +0.0% +0.0%
linear +0.1% 0.0% -0.0% +0.0% 0.0%
listcompr +0.1% 0.0% +0.0% +0.0% +0.0%
listcopy +0.1% 0.0% +0.0% +0.0% +0.0%
maillist +0.1% 0.0% +0.0% -0.0% -0.0%
mandel +0.1% 0.0% +0.0% +0.0% 0.0%
mandel2 +0.1% 0.0% +0.0% +0.0% +0.0%
mate +0.1% 0.0% +0.0% 0.0% +0.0%
minimax +0.1% 0.0% -0.0% 0.0% -0.0%
mkhprog +0.1% 0.0% +0.0% +0.0% +0.0%
multiplier +0.1% 0.0% +0.0% 0.0% 0.0%
n-body +0.1% 0.0% +0.0% +0.0% +0.0%
nucleic2 +0.1% 0.0% +0.0% +0.0% +0.0%
para +0.1% 0.0% 0.0% +0.0% +0.0%
paraffins +0.1% 0.0% +0.0% -0.0% 0.0%
parser +0.1% 0.0% -0.0% -0.0% -0.0%
parstof +0.1% 0.0% +0.0% +0.0% +0.0%
pic +0.1% 0.0% -0.0% -0.0% 0.0%
pidigits +0.1% 0.0% +0.0% -0.0% -0.0%
power +0.1% 0.0% +0.0% +0.0% +0.0%
pretty +0.1% 0.0% -0.0% -0.0% -0.1%
primes +0.1% 0.0% -0.0% -0.0% -0.0%
primetest +0.1% 0.0% -0.0% -0.0% -0.0%
prolog +0.1% 0.0% -0.0% -0.0% -0.0%
puzzle +0.1% 0.0% -0.0% -0.0% -0.0%
queens +0.1% 0.0% +0.0% +0.0% +0.0%
reptile +0.1% 0.0% -0.0% -0.0% +0.0%
reverse-complem +0.1% 0.0% +0.0% 0.0% -0.0%
rewrite +0.1% 0.0% -0.0% -0.0% -0.0%
rfib +0.1% 0.0% +0.0% +0.0% +0.0%
rsa +0.1% 0.0% -0.0% +0.0% -0.0%
scc +0.1% 0.0% -0.0% -0.0% -0.1%
sched +0.1% 0.0% +0.0% +0.0% +0.0%
scs +0.1% 0.0% +0.0% +0.0% +0.0%
simple +0.1% 0.0% -0.0% -0.0% -0.0%
solid +0.1% 0.0% +0.0% +0.0% +0.0%
sorting +0.1% 0.0% -0.0% -0.0% -0.0%
spectral-norm +0.1% 0.0% +0.0% +0.0% +0.0%
sphere +0.1% 0.0% -0.0% -0.0% -0.0%
symalg +0.1% 0.0% -0.0% -0.0% -0.0%
tak +0.1% 0.0% +0.0% +0.0% +0.0%
transform +0.1% 0.0% +0.0% +0.0% +0.0%
treejoin +0.1% 0.0% +0.0% -0.0% -0.0%
typecheck +0.1% 0.0% +0.0% +0.0% +0.0%
veritas +0.0% 0.0% +0.0% +0.0% +0.0%
wang +0.1% 0.0% 0.0% +0.0% +0.0%
wave4main +0.1% 0.0% +0.0% +0.0% +0.0%
wheel-sieve1 +0.1% 0.0% +0.0% +0.0% +0.0%
wheel-sieve2 +0.1% 0.0% +0.0% +0.0% +0.0%
x2n1 +0.1% 0.0% +0.0% +0.0% +0.0%
--------------------------------------------------------------------------------
Min +0.0% 0.0% -0.0% -0.0% -0.1%
Max +0.1% 0.0% +0.0% +0.0% +0.0%
Geometric Mean +0.1% -0.0% -0.0% -0.0% -0.0%
Bumping numbers of nonsensical perf tests:
Metric Increase:
T12150
T12234
T12425
T13035
T5837
T6048
It's simply not possible for this patch to increase allocations, and
I've wasted enough time on these test in the past (see #17686). I think
these tests should not be perf tests, but for now I'll bump the numbers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now differentiate three cases of constructor bindings:
1)Bindings which we can "replace" with a reference to
an existing closure. Reference the replacement closure
when accessing the binding.
2)Bindings which we can "replace" as above. But we still
generate a closure which will be referenced by modules
importing this binding.
3)For any other binding generate a closure. Then reference
it.
Before this patch 1) did only apply to local bindings and we
didn't do 2) at all.
|
| |
|
| |
|
|
|
|
|
|
| |
We now do a shallow closure check on objects in compact regions.
See the new comment on why we can't do a "normal" closure check.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Effects as I measured them:
RTS Size: +0.1%
Compile times: -0.5%
Runtine nofib: -1.1%
Nofib runtime result seems to mostly come from the `CS` benchmark
which is very sensible to alignment changes so this is likely over
represented.
However the compile time changes are realistic.
This is related to #16961.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here the following changes are introduced:
- A read barrier machine op is added to Cmm.
- The order in which a closure's fields are read and written is changed.
- Memory barriers are added to RTS code to ensure correctness on
out-or-order machines with weak memory ordering.
Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
is lowered to an instruction that ensures memory reads that occur after said
instruction in program order are not performed before reads coming before said
instruction in program order. On machines with strong memory ordering properties
(e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
MO_ReadBarrier is simply erased. However, such an instruction is necessary on
weakly ordered machines, e.g. ARM and PowerPC.
Weam memory ordering has consequences for how closures are observed and mutated.
For example, consider a closure that needs to be updated to an indirection. In
order for the indirection to be safe for concurrent observers to enter, said
observers must read the indirection's info table before they read the
indirectee. Furthermore, the entering observer makes assumptions about the
closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
closure has an indirectee pointer that is safe to follow.
When a closure is updated with an indirection, both its info table and its
indirectee must be written. With weak memory ordering, these two writes can be
arbitrarily reordered, and perhaps even interleaved with other threads' reads
and writes (in the absence of memory barrier instructions). Consider this
example of a bad reordering:
- An updater writes to a closure's info table (INFO_TYPE is now IND).
- A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
- A concurrent observer reads the closure's indirectee and enters it. (!!!)
- An updater writes the closure's indirectee.
Here the update to the indirectee comes too late and the concurrent observer has
jumped off into the abyss. Speculative execution can also cause us issues,
consider:
- An observer is about to case on a value in closure's info table.
- The observer speculatively reads one or more of closure's fields.
- An updater writes to closure's info table.
- The observer takes a branch based on the new info table value, but with the
old closure fields!
- The updater writes to the closure's other fields, but its too late.
Because of these effects, reads and writes to a closure's info table must be
ordered carefully with respect to reads and writes to the closure's other
fields, and memory barriers must be placed to ensure that reads and writes occur
in program order. Specifically, updates to a closure must follow the following
pattern:
- Update the closure's (non-info table) fields.
- Write barrier.
- Update the closure's info table.
Observing a closure's fields must follow the following pattern:
- Read the closure's info pointer.
- Read barrier.
- Read the closure's (non-info table) fields.
This patch updates RTS code to obey this pattern. This should fix long-standing
SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
out-of-order execution) and PowerPC. This fixes issue #15449.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan:
Validated locally, but skipped perf tests as there's a
framework-related error
there.
Reviewers: simonmar, bgamari, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5421
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now show address of the entered object in error messages. Example:
foo: internal error: Evaluated a CAF (0xe4c518) that was GC'd!
(GHC version 8.6.0.20180907 for x86_64_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Helpful when debugging.
Test Plan: This validates
Reviewers: simonmar, bgamari, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5143
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Retainer profiler: init_srt_thunk() should mark the stack entry as SRT
- Retainer profiler: Remove an incorrect assertion about FUN_STATIC.
FUN_STATIC does not have to have an SRT.
- Fix nptrs of BCO
Test Plan: validate
Reviewers: simonmar, bgamari, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5134
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY
MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN
Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
MUT_ARR_PTRS).
(alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
Removed a few comments in Scav.c about FROZEN0 being on the mut_list
because it's now clear from the closure type.
Reviewers: bgamari, simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4784
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feature has some very serious correctness issues (#14310),
introduces a great deal of complexity, and hasn't seen wide usage.
Consequently we are removing it, as proposed in Proposal #77 [1]. This
is heavily based on a patch from fryguybob.
Updates stm submodule.
[1] https://github.com/ghc-proposals/ghc-proposals/pull/77
Test Plan: Validate
Reviewers: erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #14310
Differential Revision: https://phabricator.haskell.org/D4760
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Previously we would hvae a single big table of pointers per module,
with a set of bitmaps to reference entries within it. The new
representation is identical to a static constructor, which is much
simpler for the GC to traverse, and we get to remove the complicated
bitmap-traversal code from the GC.
- Rewrite all the code to generate SRTs in CmmBuildInfoTables, and
document it much better (see Note [SRTs]). This has been something
I've wanted to do since we moved to the new code generator, I
finally had the opportunity to finish it while on a transatlantic
flight recently :)
There are a series of 4 diffs:
1. D4632 (this one), which does the bulk of the changes
2. D4633 which adds support for smaller `CmmLabelDiffOff` constants
3. D4634 which takes advantage of D4632 and D4633 to save a word in
info tables that have an SRT on x86_64. This is where most of the
binary size improvement comes from.
4. D4637 which makes a further optimisation to merge some SRTs with
static FUN closures. This adds some complexity and the benefits
are fairly modest, so it's not clear yet whether we should do this.
Results (after (3), on x86_64)
- GHC itself (staticaly linked) is 5.2% smaller
- -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
- I measured the overhead of traversing all the static objects in a
major GC in GHC itself by doing `replicateM_ 1000 performGC` as the
first thing in `Main.main`. The new version was 5-10% faster, but
the results did vary quite a bit.
- I'm not sure if there's a compile-time difference, the results are
too unreliable.
Test Plan: validate
Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4632
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: ci
Reviewers: osa1, bgamari, erikd
Reviewed By: osa1
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4517
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing internal counters:
* gc_alloc_block_sync
* whitehole_spin
* gen[g].sync
* gen[1].sync
are now not shown in the -s report unless --internal-counters is also passed.
If --internal-counters is passed we now show the counters above, reformatted, as
well as several other counters. In particular, we now count the yieldThread()
calls that SpinLocks do as well as their spins.
The added counters are:
* gc_spin (spin and yield)
* mut_spin (spin and yield)
* whitehole_threadPaused (spin only)
* whitehole_executeMessage (spin only)
* whitehole_lockClosure (spin only)
* waitForGcThreadsd (spin and yield)
As well as the following, which are not SpinLock-like things:
* any_work
* do_work
* scav_find_work
See the Note for descriptions of what these counters are.
We add busy_wait_nops in these loops along with the counter increment where it
was absent.
Old internal counters output:
```
gc_alloc_block_sync: 0
whitehole_gc_spin: 0
gen[0].sync: 0
gen[1].sync: 0
```
New internal counters output:
```
Internal Counters:
Spins Yields
gc_alloc_block_sync 323 0
gc_spin 9016713 752
mut_spin 57360944 47716
whitehole_gc 0 n/a
whitehole_threadPaused 0 n/a
whitehole_executeMessage 0 n/a
whitehole_lockClosure 0 0
waitForGcThreads 2 415
gen[0].sync 6 0
gen[1].sync 1 0
any_work 2017
no_work 2014
scav_find_work 1004
```
Test Plan:
./validate
Check it builds with #define PROF_SPIN removed from includes/rts/Config.h
Reviewers: bgamari, erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #3553, #9221
Differential Revision: https://phabricator.haskell.org/D4302
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: austin, erikd, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3985
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And use to mark `stg_stack_underflow_frame`, which we are unable to
determine a caller from.
To simplify parsing at the moment we steal the `return` keyword to
indicate an undefined unwind value. Perhaps this should be revisited.
Reviewers: scpmw, simonmar, austin, erikd
Subscribers: dfeuer, thomie
Differential Revision: https://phabricator.haskell.org/D2738
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* In stg_ap_0_fast, if we're evaluating a thunk, the thunk might
evaluate to a function in which case we may have to adjust its CCS.
* The interpreter has its own implementation of stg_ap_0_fast, so we
have to do the same shenanigans with creating empty PAPs and copying
PAPs there.
* GHCi creates Cost Centres as children of CCS_MAIN, which enterFunCCS()
wrongly assumed to imply that they were CAFs. Now we use the is_caf
flag for this, which we have to correctly initialise when we create a
Cost Centre in GHCi.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit makes various improvements and addresses some issues with
Compact Regions (aka Compact Normal Forms).
This was the most important thing I wanted to fix. Compaction
previously prevented GC from running until it was complete, which
would be a problem in a multicore setting. Now, we compact using a
hand-written Cmm routine that can be interrupted at any point. When a
GC is triggered during a sharing-enabled compaction, the GC has to
traverse and update the hash table, so this hash table is now stored
in the StgCompactNFData object.
Previously, compaction consisted of a deepseq using the NFData class,
followed by a traversal in C code to copy the data. This is now done
in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
longer use the NFData instances, instead the Cmm routine evaluates
components directly as it compacts.
The new compaction is about 50% faster than the old one with no
sharing, and a little faster on average with sharing (the cost of the
hash table dominates when we're doing sharing).
Static objects that don't (transitively) refer to any CAFs don't need
to be copied into the compact region. In particular this means we
often avoid copying Char values and small Int values, because these
are static closures in the runtime.
Each Compact# object can support a single compactAdd# operation at any
given time, so the Data.Compact library now enforces mutual exclusion
using an MVar stored in the Compact object.
We now get exceptions rather than killing everything with a barf()
when we encounter an object that cannot be compacted (a function, or a
mutable object). We now also detect pinned objects, which can't be
compacted either.
The Data.Compact API has been refactored and cleaned up. A new
compactSize operation returns the size (in bytes) of the compact
object.
Most of the documentation is in the Haddock docs for the compact
library, which I've expanded and improved here.
Various comments in the code have been improved, especially the main
Note [Compact Normal Forms] in rts/sm/CNF.c.
I've added a few tests, and expanded a few of the tests that were
there. We now also run the tests with GHCi, and in a new test way
that enables sanity checking (+RTS -DS).
There's a benchmark in libraries/compact/tests/compact_bench.hs for
measuring compaction speed and comparing sharing vs. no sharing.
The field totalDataW in StgCompactNFData was unnecessary.
Test Plan:
* new unit tests
* validate
* tested manually that we can compact Data.Aeson data
Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
Subscribers: thomie, simonpj
Differential Revision: https://phabricator.haskell.org/D2751
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We currently have two info tables for a constructor
* XXX_con_info: the info table for a heap-resident instance of the
constructor, It has type CONSTR, or one of the specialised types like
CONSTR_1_0
* XXX_static_info: the info table for a static instance of this
constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
I'm getting rid of the latter, and using the `con_info` info table for
both static and dynamic constructors. For rationale and more details
see Note [static constructors] in SMRep.hs.
I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
`closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
distinction, and anyway HEAP_ALLOCED() does the same job.
Test Plan: validate
Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2690
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of stg_interp_constr_entry there are now 7 functions (one for
each value of the tag bits) that tag the constructor pointer before
returning. This is consistent with compiled constructors' entry code,
and expectations that compiled code places on compiled constructors. The
iserv protocol is extended with an extra field that explains what
pointer tag the constructor should use.
Test Plan: Added tests for #12523
Reviewers: erikd, bgamari, hvr, austin, simonmar
Reviewed By: simonmar
Subscribers: osa1, thomie, rwbarton
Differential Revision: https://phabricator.haskell.org/D2473
GHC Trac Issues: #12523
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea behind adding special "rubbish" arguments was in unboxed sum types
depending on the tag some arguments are not used and we don't want to move some
special values (like 0 for literals and some special pointer for boxed slots)
for those arguments (to stack locations or registers). "StgRubbishArg" was an
indicator to the code generator that the value won't be used. During Stg-to-Cmm
we were then not generating any move or store instructions at all.
This caused problems in the register allocator because some variables were only
initialized in some code paths. As an example, suppose we have this STG: (after
unarise)
Lib.$WT =
\r [dt_sit]
case
case dt_sit of {
Lib.F dt_siv [Occ=Once] ->
(#,,#) [1# dt_siv StgRubbishArg::GHC.Prim.Int#];
Lib.I dt_siw [Occ=Once] ->
(#,,#) [2# StgRubbishArg::GHC.Types.Any dt_siw];
}
of
dt_six
{ (#,,#) us_giC us_giD us_giE -> Lib.T [us_giC us_giD us_giE];
};
This basically unpacks a sum type to an unboxed sum with 3 fields, and then
moves the unboxed sum to a constructor (`Lib.T`).
This is the Cmm for the inner case expression (case expression in the scrutinee
position of the outer case):
ciN:
...
-- look at dt_sit's tag
if (_ciT::P64 != 1) goto ciS; else goto ciR;
ciS: -- Tag is 2, i.e. Lib.F
_siw::I64 = I64[_siu::P64 + 6];
_giE::I64 = _siw::I64;
_giD::P64 = stg_RUBBISH_ENTRY_info;
_giC::I64 = 2;
goto ciU;
ciR: -- Tag is 1, i.e. Lib.I
_siv::P64 = P64[_siu::P64 + 7];
_giD::P64 = _siv::P64;
_giC::I64 = 1;
goto ciU;
Here one of the blocks `ciS` and `ciR` is executed and then the execution
continues to `ciR`, but only `ciS` initializes `_giE`, in the other branch
`_giE` is not initialized, because it's "rubbish" in the STG and so we don't
generate an assignment during code generator. The code generator then panics
during the register allocations:
ghc-stage1: panic! (the 'impossible' happened)
(GHC version 8.1.20160722 for x86_64-unknown-linux):
LocalReg's live-in to graph ciY {_giE::I64}
(`_giD` is also "rubbish" in `ciS`, but it's still initialized because it's a
pointer slot, we have to initialize it otherwise garbage collector follows the
pointer to some random place. So we only remove assignment if the "rubbish" arg
has unboxed type.)
This patch removes `StgRubbishArg` and `CmmArg`. We now always initialize
rubbish slots. If the slot is for boxed types we use the existing `absentError`,
otherwise we initialize the slot with literal 0.
Reviewers: simonpj, erikd, austin, simonmar, bgamari
Reviewed By: erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2446
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements primitive unboxed sum types, as described in
https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes.
Main changes are:
- Add new syntax for unboxed sums types, terms and patterns. Hidden
behind `-XUnboxedSums`.
- Add unlifted unboxed sum type constructors and data constructors,
extend type and pattern checkers and desugarer.
- Add new RuntimeRep for unboxed sums.
- Extend unarise pass to translate unboxed sums to unboxed tuples right
before code generation.
- Add `StgRubbishArg` to `StgArg`, and a new type `CmmArg` for better
code generation when sum values are involved.
- Add user manual section for unboxed sums.
Some other changes:
- Generalize `UbxTupleRep` to `MultiRep` and `UbxTupAlt` to
`MultiValAlt` to be able to use those with both sums and tuples.
- Don't use `tyConPrimRep` in `isVoidTy`: `tyConPrimRep` is really
wrong, given an `Any` `TyCon`, there's no way to tell what its kind
is, but `kindPrimRep` and in turn `tyConPrimRep` returns `PtrRep`.
- Fix some bugs on the way: #12375.
Not included in this patch:
- Update Haddock for new the new unboxed sum syntax.
- `TemplateHaskell` support is left as future work.
For reviewers:
- Front-end code is mostly trivial and adapted from unboxed tuple code
for type checking, pattern checking, renaming, desugaring etc.
- Main translation routines are in `RepType` and `UnariseStg`.
Documentation in `UnariseStg` should be enough for understanding
what's going on.
Credits:
- Johan Tibell wrote the initial front-end and interface file
extensions.
- Simon Peyton Jones reviewed this patch many times, wrote some code,
and helped with debugging.
Reviewers: bgamari, alanz, goldfire, RyanGlScott, simonpj, austin,
simonmar, hvr, erikd
Reviewed By: simonpj
Subscribers: Iceland_jack, ggreif, ezyang, RyanGlScott, goldfire,
thomie, mpickering
Differential Revision: https://phabricator.haskell.org/D2259
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings in initial support for compact regions, as described in the
ICFP 2015 paper "Efficient Communication and Collection with Compact
Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
Campagna.
Some things may change before the 8.2 release, but I (Simon M.) wanted
to get the main patch committed so that we can iterate.
What documentation there is is in the Data.Compact module in the new
compact package. We'll need to extend and polish the documentation
before the release.
Test Plan:
validate
(new test cases included)
Reviewers: ezyang, simonmar, hvr, bgamari, austin
Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
Differential Revision: https://phabricator.haskell.org/D1264
GHC Trac Issues: #11493
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
stg_stack_underflow_frame had an incorrect
call of C function 'threadStackUnderflow':
("ptr" ret_off) =
foreign "C" threadStackUnderflow(
MyCapability(),
CurrentTSO);
Which means it's prototype is:
void * (*) (W_, void*);
While real prototype is:
W_ (*) (Capability *cap, StgTSO *tso);
The fix is simple. Fix type annotations:
(ret_off) =
foreign "C" threadStackUnderflow(
MyCapability() "ptr",
CurrentTSO "ptr");
Noticed when debugged T9045 test failure
on m68k target which distincts between
pointer and non pointer return types
(uses different registers)
While at it noticed and fixed return types
for 'throwTo' and 'findSpark'.
Trac #11395
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
it seems that this closure type has not been in use since 5d52d9, so all
this is dead and untested code. This removes it. Some of the code might
be useful for a counting indirection as described in #10613, so when
implementing that, have a look at what this commit removes.
Test Plan: validate on harbormaster
Reviewers: austin, bgamari, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1821
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: austin, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1047
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unwind information allows the debugger to discover more information
about a program state, by allowing it to "reconstruct" other states of
the program. In practice, this means that we explain to the debugger
how to unravel stack frames, which comes down mostly to explaining how
to find their Sp and Ip register values.
* We declare yet another new constructor for CmmNode - and this time
there's actually little choice, as unwind information can and will
change mid-block. We don't actually make use of these capabilities,
and back-end support would be tricky (generate new labels?), but it
feels like the right way to do it.
* Even though we only use it for Sp so far, we allow CmmUnwind to specify
unwind information for any register. This is pretty cheap and could
come in useful in future.
* We allow full CmmExpr expressions for specifying unwind values. The
advantage here is that we don't have to make up new syntax, and can e.g.
use the WDS macro directly. On the other hand, the back-end will now
have to simplify the expression until it can sensibly be converted
into DWARF byte code - a process which might fail, yielding NCG panics.
On the other hand, when you're writing Cmm by hand you really ought to
know what you're doing.
(From Phabricator D169)
|
| |
|
|
|
|
| |
This reverts commit 3b5a840bba375c4c4c11ccfeb283f84c3a1ef22c.
|
|
|
|
|
|
|
| |
This reverts commit 35672072b4091d6f0031417bc160c568f22d0469.
Conflicts:
compiler/main/DriverPipeline.hs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In preparation for indirecting all references to closures,
we rename _closure to _static_closure to ensure any old code
will get an undefined symbol error. In order to reference
a closure foobar_closure (which is now undefined), you should instead
use STATIC_CLOSURE(foobar). For convenience, a number of these
old identifiers are macro'd.
Across C-- and C (Windows and otherwise), there were differing
conventions on whether or not foobar_closure or &foobar_closure
was the address of the closure. Now, all foobar_closure references
are addresses, and no & is necessary.
CHARLIKE/INTLIKE were not changed, simply alpha-renamed.
Part of remove HEAP_ALLOCED patch set (#8199)
Depends on D265
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Test Plan: validate
Reviewers: simonmar, austin
Subscribers: simonmar, ezyang, carter, thomie
Differential Revision: https://phabricator.haskell.org/D267
GHC Trac Issues: #8199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, there were two variants of CLOSURE in C--:
- Top-level CLOSURE(foo_closure, foo, lits...), which defines a new
static closure and gives it a name, and
- Array CLOSURE(foo, lits...), which was used for the static char
and integer arrays.
They used the same name, were confusing, and didn't even generate
the correct internal label representation! So now, we have two
new forms:
- Top-level CLOSURE(foo, lits...) which automatically generates
foo_closure (along with foo_info, which we were doing already)
- Array ANONYMOUS_CLOSURE(foo, lits...) which doesn't generate
a foo_closure identifier.
Part of remove HEAP_ALLOCED patch set (#8199)
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Test Plan: validate
Reviewers: simonmar, austin
Subscribers: simonmar, ezyang, carter, thomie
Differential Revision: https://phabricator.haskell.org/D264
GHC Trac Issues: #8199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These array types are smaller than Array# and MutableArray# and are
faster when the array size is small, as they don't have the overhead
of a card table. Having no card table reduces the closure size with 2
words in the typical small array case and leads to less work when
updating or GC:ing the array.
Reduces both the runtime and memory allocation by 8.8% on my insert
benchmark for the HashMap type in the unordered-containers package,
which makes use of lots of small arrays. With tuned GC settings
(i.e. `+RTS -A6M`) the runtime reduction is 15%.
Fixes #8923.
|
|
|
|
|
| |
Signed-off-by: Arash Rouhani <rarash@student.chalmers.se>
Reviewed-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This resurrects some old code and makes it work again. The idea is
that we want to get an error message if we ever enter a CAF that has
been GC'd, rather than following its indirection which will likely
cause a segfault. Without this patch, these bugs are hard to track
down in gdb, because the IND_STATIC code overwrites R1 (the pointer to
the CAF) with its indirectee before jumping into bad memory, so we've
lost the address of the CAF that got GC'd.
Some associated refactoring while I was here.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit replaces mkWeakForeignEnv# with addCFinalizerToWeak#.
This new primop mutates an existing Weak# object and adds a new
C finalizer to it.
This change removes an invariant in MarkWeak.c, namely that the relative
order of Weak# objects in the list needs to be preserved across GC. This
makes it easier to split the list into per-generation structures.
The patch also removes a race condition between two threads calling
finalizeWeak# on the same WEAK object at that same time.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, threads blocked on an STM retry would be sent a wakeup
message each time an unpark was requested. This could result in the
accumulation of a large number of wake-up messages, which would slow
wake-up once the sleeping thread is finally scheduled.
Here, we introduce a new closure type, STM_AWOKEN, which marks a TSO
which has been sent a wake-up message, allowing us to send only one
wakeup.
|
|
|
|
|
|
|
|
|
|
| |
This improves GC performance when there are a lot of TVars in the
heap. For instance, a TChan with a lot of elements causes a massive
GC drag without this patch.
There's more to do - several other STM closure types don't have write
barriers, so GC performance when there are a lot of threads blocked on
STM isn't great. But fixing the problem for TVar is a good start.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change here is that the Cmm parser now allows high-level cmm
code with argument-passing and function calls. For example:
foo ( gcptr a, bits32 b )
{
if (b > 0) {
// we can make tail calls passing arguments:
jump stg_ap_0_fast(a);
}
return (x,y);
}
More details on the new cmm syntax are in Note [Syntax of .cmm files]
in CmmParse.y.
The old syntax is still more-or-less supported for those occasional
code fragments that really need to explicitly manipulate the stack.
However there are a couple of differences: it is now obligatory to
give a list of live GlobalRegs on every jump, e.g.
jump %ENTRY_CODE(Sp(0)) [R1];
Again, more details in Note [Syntax of .cmm files].
I have rewritten most of the .cmm files in the RTS into the new
syntax, except for AutoApply.cmm which is generated by the genapply
program: this file could be generated in the new syntax instead and
would probably be better off for it, but I ran out of enthusiasm.
Some other changes in this batch:
- The PrimOp calling convention is gone, primops now use the ordinary
NativeNodeCall convention. This means that primops and "foreign
import prim" code must be written in high-level cmm, but they can
now take more than 10 arguments.
- CmmSink now does constant-folding (should fix #7219)
- .cmm files now go through the cmmPipeline, and as a result we
generate better code in many cases. All the object files generated
for the RTS .cmm files are now smaller. Performance should be
better too, but I haven't measured it yet.
- RET_DYN frames are removed from the RTS, lots of code goes away
- we now have some more canned GC points to cover unboxed-tuples with
2-4 pointers, which will reduce code size a little.
|
| |
|
|
|
|
|
| |
We may need to do this differently once we get as far as building the
RTS in the dyn ways.
|