| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Update Haddock submodule
Metric Increase:
haddock.compiler
|
|
|
|
| |
Fixes #11261
|
|
|
|
|
| |
Otherwise we are sensitive to libc's buffering strategy.
Similar to the issue fixed in 543dfaab166c81f46ac4af76918ce32190aaab22.
|
| |
|
|
|
|
| |
submodule updates: nofib, haddock
|
|
|
|
| |
Update haddock submodule
|
| |
|
|
|
|
| |
If hs_try_putmvar was called through an unsafe import, it would lose track of the running cap causing a deadlock
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently compacting GC has the invariant that in a chain all fields are tagged
the same. However this does not really hold: root pointers are not tagged, so
when we thread a root we initialize a chain without a tag. When the pointed
objects is evaluated and we have more pointers to it from the heap, we then add
*tagged* fields to the chain (because pointers to it from the heap are tagged),
ending up chaining fields with different tags (pointers from roots are NOT
tagged, pointers from heap are). This breaks the invariant and as a result
compacting GC turns tagged pointers into non-tagged.
This later causes problem in the generated code where we do reads assuming that
the pointer is aligned, e.g.
0x7(%rax) -- assumes that pointer is tagged 1
which causes misaligned reads. This caused #17088.
We fix this using the "pointer tagging for large families" patch (#14373,
!1742):
- With the pointer tagging patch the GC can know what the tagged pointer to a
CONSTR should be (previously we'd need to know the family size -- large
families are always tagged 1, small families are tagged depending on the
constructor).
- Since we now know what the tags should be we no longer need to store the
pointer tag in the info table pointers when forming chains in the compacting
GC.
As a result we no longer need to tag pointers in chains with 1/2 depending on
whether the field points to an info table pointer, or to another field: an info
table pointer is always tagged 0, everything else in the chain is tagged 1. The
lost tags in pointers can be retrieved by looking at the info table.
Finally, instead of using tag 1 for fields and tag 0 for info table pointers, we
use two different tags for fields:
- 1 for fields that have untagged pointers
- 2 for fields that have tagged pointers
When unchaining we then look at the pointer to a field, and depending on its tag
we either leave a tagged pointer or an untagged pointer in the field.
This allows chaining untagged and tagged fields together in compacting GC.
Fixes #17088
Nofib results
-------------
Binaries are smaller because of smaller `Compact.c` code.
make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" EXTRA_HC_OPTS="-with-rtsopts=-c" NoFibRuns=1
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS -0.3% 0.0% +0.0% +0.0% +0.0%
CSD -0.3% 0.0% +0.0% +0.0% +0.0%
FS -0.3% 0.0% +0.0% -0.0% -0.0%
S -0.3% 0.0% +5.4% +0.8% +3.9%
VS -0.3% 0.0% +0.0% -0.0% -0.0%
VSD -0.3% 0.0% -0.0% -0.0% -0.2%
VSM -0.3% 0.0% +0.0% +0.0% +0.0%
anna -0.1% 0.0% +0.0% +0.0% +0.0%
ansi -0.3% 0.0% +0.1% +0.0% +0.0%
atom -0.2% 0.0% +0.0% +0.0% +0.0%
awards -0.2% 0.0% +0.0% 0.0% -0.0%
banner -0.3% 0.0% +0.0% +0.0% +0.0%
bernouilli -0.3% 0.0% +0.1% +0.0% +0.0%
binary-trees -0.2% 0.0% +0.0% 0.0% +0.0%
boyer -0.3% 0.0% +0.2% +0.0% +0.0%
boyer2 -0.2% 0.0% +0.2% +0.1% +0.0%
bspt -0.2% 0.0% +0.0% +0.0% +0.0%
cacheprof -0.2% 0.0% +0.0% +0.0% +0.0%
calendar -0.3% 0.0% +0.0% +0.0% +0.0%
cichelli -0.3% 0.0% +1.1% +0.2% +0.5%
circsim -0.2% 0.0% +0.0% -0.0% -0.0%
clausify -0.3% 0.0% +0.0% -0.0% -0.0%
comp_lab_zift -0.2% 0.0% +0.0% +0.0% +0.0%
compress -0.3% 0.0% +0.0% +0.0% +0.0%
compress2 -0.3% 0.0% +0.0% -0.0% -0.0%
constraints -0.3% 0.0% +0.2% +0.1% +0.1%
cryptarithm1 -0.3% 0.0% +0.0% -0.0% 0.0%
cryptarithm2 -0.3% 0.0% +0.0% +0.0% +0.0%
cse -0.3% 0.0% +0.0% +0.0% +0.0%
digits-of-e1 -0.3% 0.0% +0.0% +0.0% +0.0%
digits-of-e2 -0.3% 0.0% +0.0% +0.0% -0.0%
dom-lt -0.2% 0.0% +0.0% +0.0% +0.0%
eliza -0.2% 0.0% +0.0% +0.0% +0.0%
event -0.3% 0.0% +0.1% +0.0% -0.0%
exact-reals -0.2% 0.0% +0.0% +0.0% +0.0%
exp3_8 -0.3% 0.0% +0.0% +0.0% +0.0%
expert -0.2% 0.0% +0.0% +0.0% +0.0%
fannkuch-redux -0.3% 0.0% -0.0% -0.0% -0.0%
fasta -0.3% 0.0% +0.0% +0.0% +0.0%
fem -0.2% 0.0% +0.1% +0.0% +0.0%
fft -0.2% 0.0% +0.0% -0.0% -0.0%
fft2 -0.2% 0.0% +0.0% -0.0% +0.0%
fibheaps -0.3% 0.0% +0.0% -0.0% -0.0%
fish -0.3% 0.0% +0.0% +0.0% +0.0%
fluid -0.2% 0.0% +0.4% +0.1% +0.1%
fulsom -0.2% 0.0% +0.0% +0.0% +0.0%
gamteb -0.2% 0.0% +0.1% +0.0% +0.0%
gcd -0.3% 0.0% +0.0% +0.0% +0.0%
gen_regexps -0.3% 0.0% +0.0% -0.0% -0.0%
genfft -0.3% 0.0% +0.0% +0.0% +0.0%
gg -0.2% 0.0% +0.7% +0.3% +0.2%
grep -0.2% 0.0% +0.0% +0.0% +0.0%
hidden -0.2% 0.0% +0.0% +0.0% +0.0%
hpg -0.2% 0.0% +0.1% +0.0% +0.0%
ida -0.3% 0.0% +0.0% +0.0% +0.0%
infer -0.2% 0.0% +0.0% -0.0% -0.0%
integer -0.3% 0.0% +0.0% +0.0% +0.0%
integrate -0.2% 0.0% +0.0% +0.0% +0.0%
k-nucleotide -0.2% 0.0% +0.0% +0.0% -0.0%
kahan -0.3% 0.0% -0.0% -0.0% -0.0%
knights -0.3% 0.0% +0.0% -0.0% -0.0%
lambda -0.3% 0.0% +0.0% -0.0% -0.0%
last-piece -0.3% 0.0% +0.0% +0.0% +0.0%
lcss -0.3% 0.0% +0.0% +0.0% 0.0%
life -0.3% 0.0% +0.0% -0.0% -0.0%
lift -0.2% 0.0% +0.0% +0.0% +0.0%
linear -0.2% 0.0% +0.0% +0.0% +0.0%
listcompr -0.3% 0.0% +0.0% +0.0% +0.0%
listcopy -0.3% 0.0% +0.0% +0.0% +0.0%
maillist -0.3% 0.0% +0.0% -0.0% -0.0%
mandel -0.2% 0.0% +0.0% +0.0% +0.0%
mandel2 -0.3% 0.0% +0.0% +0.0% +0.0%
mate -0.2% 0.0% +0.0% +0.0% +0.0%
minimax -0.3% 0.0% +0.0% +0.0% +0.0%
mkhprog -0.2% 0.0% +0.0% +0.0% +0.0%
multiplier -0.3% 0.0% +0.0% -0.0% -0.0%
n-body -0.2% 0.0% -0.0% -0.0% -0.0%
nucleic2 -0.2% 0.0% +0.0% +0.0% +0.0%
para -0.2% 0.0% +0.0% -0.0% -0.0%
paraffins -0.3% 0.0% +0.0% -0.0% -0.0%
parser -0.2% 0.0% +0.0% +0.0% +0.0%
parstof -0.2% 0.0% +0.8% +0.2% +0.2%
pic -0.2% 0.0% +0.1% -0.1% -0.1%
pidigits -0.3% 0.0% +0.0% +0.0% +0.0%
power -0.2% 0.0% +0.0% -0.0% -0.0%
pretty -0.3% 0.0% -0.0% -0.0% -0.1%
primes -0.3% 0.0% +0.0% +0.0% -0.0%
primetest -0.2% 0.0% +0.0% -0.0% -0.0%
prolog -0.3% 0.0% +0.0% -0.0% -0.0%
puzzle -0.3% 0.0% +0.0% +0.0% +0.0%
queens -0.3% 0.0% +0.0% +0.0% +0.0%
reptile -0.2% 0.0% +0.2% +0.1% +0.0%
reverse-complem -0.3% 0.0% +0.0% +0.0% +0.0%
rewrite -0.3% 0.0% +0.0% -0.0% -0.0%
rfib -0.2% 0.0% +0.0% +0.0% -0.0%
rsa -0.2% 0.0% +0.0% +0.0% +0.0%
scc -0.3% 0.0% -0.0% -0.0% -0.1%
sched -0.3% 0.0% +0.0% +0.0% +0.0%
scs -0.2% 0.0% +0.1% +0.0% +0.0%
simple -0.2% 0.0% +3.4% +1.0% +1.8%
solid -0.2% 0.0% +0.0% +0.0% +0.0%
sorting -0.3% 0.0% +0.0% +0.0% +0.0%
spectral-norm -0.2% 0.0% -0.0% -0.0% -0.0%
sphere -0.2% 0.0% +0.0% +0.0% +0.0%
symalg -0.2% 0.0% +0.0% +0.0% +0.0%
tak -0.3% 0.0% +0.0% +0.0% -0.0%
transform -0.2% 0.0% +0.2% +0.1% +0.1%
treejoin -0.3% 0.0% +0.2% -0.0% -0.1%
typecheck -0.3% 0.0% +0.0% +0.0% +0.0%
veritas -0.1% 0.0% +0.0% +0.0% +0.0%
wang -0.2% 0.0% +0.0% -0.0% -0.0%
wave4main -0.3% 0.0% +0.0% -0.0% -0.0%
wheel-sieve1 -0.3% 0.0% +0.0% -0.0% -0.0%
wheel-sieve2 -0.3% 0.0% +0.0% -0.0% -0.0%
x2n1 -0.3% 0.0% +0.0% +0.0% +0.0%
--------------------------------------------------------------------------------
Min -0.3% 0.0% -0.0% -0.1% -0.2%
Max -0.1% 0.0% +5.4% +1.0% +3.9%
Geometric Mean -0.3% -0.0% +0.1% +0.0% +0.1%
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
circsim -0.2% 0.0% +1.6% +0.4% +0.7%
constraints -0.3% 0.0% +4.3% +1.5% +2.3%
fibheaps -0.3% 0.0% +3.5% +1.2% +1.3%
fulsom -0.2% 0.0% +3.6% +1.2% +1.8%
gc_bench -0.3% 0.0% +4.1% +1.3% +2.3%
hash -0.3% 0.0% +6.6% +2.2% +3.6%
lcss -0.3% 0.0% +0.7% +0.2% +0.7%
mutstore1 -0.3% 0.0% +4.8% +1.4% +2.8%
mutstore2 -0.3% 0.0% +3.4% +1.0% +1.7%
power -0.2% 0.0% +2.7% +0.6% +1.9%
spellcheck -0.3% 0.0% +1.1% +0.4% +0.4%
--------------------------------------------------------------------------------
Min -0.3% 0.0% +0.7% +0.2% +0.4%
Max -0.2% 0.0% +6.6% +2.2% +3.6%
Geometric Mean -0.3% +0.0% +3.3% +1.0% +1.8%
Metric changes
--------------
While it sounds ridiculous, this change causes increased allocations in
the following tests. We concluded that this change can't cause a
difference in allocations and decided to land this patch. Fluctuations
in "bytes allocated" metric is tracked in #17686.
Metric Increase:
Naperian
T10547
T12150
T12234
T12425
T13035
T5837
T6048
|
|
|
|
|
|
| |
LLVM does not guarantee any particular semantics when dereferencing null
pointers. Consequently, this test actually passes when built with the
LLVM backend.
|
|
|
|
|
| |
`T5435_v_asm_a`, `T5435_v_asm_b`, and `T5435_v_gcc` all fail on ARMv7.
See #17559.
|
|
|
|
|
| |
The LLVM backend does not guarantee any particular semantics for
division by zero, making this test unreliable across platforms.
|
|
|
|
|
|
| |
This exposes a set of interfaces from the GHC API for configuring
EventLogWriters. These can be used by consumers like
[ghc-eventlog-socket](https://github.com/bgamari/ghc-eventlog-socket).
|
|
|
|
|
|
| |
Previously the returned tuple seemed to fit in registers on amd64. This
meant that non-moving collector bug would cause the test to fail on i386
yet not amd64.
|
|
|
|
| |
Due to #17447.
|
|
|
|
| |
<Rts.h> must always come first.
|
|
|
|
|
| |
Some tests depend on the RTS linker. Introduce a modifier to skip such
tests, in case the RTS linker is not available.
|
|
|
|
|
|
|
|
|
|
| |
The RTS linker is not available on 64-bit PowerPC. Instead of
marking tests that require the RTS linker as broken on PowerPC
64-bit skip the respective tests on all platforms where the
RTS linker or a statically linked external interpreter is not
available.
Fixes #11259
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This introduces a concurrent mark & sweep garbage collector to manage the old
generation. The concurrent nature of this collector typically results in
significantly reduced maximum and mean pause times in applications with large
working sets.
Due to the large and intricate nature of the change I have opted to
preserve the fully-buildable history, including merge commits, which is
described in the "Branch overview" section below.
Collector design
================
The full design of the collector implemented here is described in detail
in a technical note
> B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
> Compiler" (2018)
This document can be requested from @bgamari.
The basic heap structure used in this design is heavily inspired by
> K. Ueno & A. Ohori. "A fully concurrent garbage collector for
> functional programs on multicore processors." /ACM SIGPLAN Notices/
> Vol. 51. No. 9 (presented at ICFP 2016)
This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.
To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).
This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).
Implementation structure
========================
The majority of the collector is implemented in a handful of files:
* `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point
to the nonmoving collector (`nonmoving_collect`), as well as the allocator
(`nonmoving_allocate`) and a number of utilities for manipulating the heap.
* `rts/NonmovingMark.c` implements the mark queue functionality, update
remembered set, and mark loop.
* `rts/NonmovingSweep.c` implements the sweep loop.
* `rts/NonmovingScav.c` implements the logic necessary to scavenge the
nonmoving heap.
Branch overview
===============
```
* wip/gc/opt-pause:
| A variety of small optimisations to further reduce pause times.
|
* wip/gc/compact-nfdata:
| Introduce support for compact regions into the non-moving
|\ collector
| \
| \
| | * wip/gc/segment-header-to-bdescr:
| | | Another optimization that we are considering, pushing
| | | some segment metadata into the segment descriptor for
| | | the sake of locality during mark
| | |
| * | wip/gc/shortcutting:
| | | Support for indirection shortcutting and the selector optimization
| | | in the non-moving heap.
| | |
* | | wip/gc/docs:
| |/ Work on implementation documentation.
| /
|/
* wip/gc/everything:
| A roll-up of everything below.
|\
| \
| |\
| | \
| | * wip/gc/optimize:
| | | A variety of optimizations, primarily to the mark loop.
| | | Some of these are microoptimizations but a few are quite
| | | significant. In particular, the prefetch patches have
| | | produced a nontrivial improvement in mark performance.
| | |
| | * wip/gc/aging:
| | | Enable support for aging in major collections.
| | |
| * | wip/gc/test:
| | | Fix up the testsuite to more or less pass.
| | |
* | | wip/gc/instrumentation:
| | | A variety of runtime instrumentation including statistics
| | / support, the nonmoving census, and eventlog support.
| |/
| /
|/
* wip/gc/nonmoving-concurrent:
| The concurrent write barriers.
|
* wip/gc/nonmoving-nonconcurrent:
| The nonmoving collector without the write barriers necessary
| for concurrent collection.
|
* wip/gc/preparation:
| A merge of the various preparatory patches that aren't directly
| implementing the GC.
|
|
* GHC HEAD
.
.
.
```
|
| |
| |
| |
| | |
The nonmoving way finalizes things in a different order.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This implements support for block group allocations which are aligned to
an integral number of blocks.
This will be used by the nonmoving garbage collector, which uses the
block allocator to allocate the segments which back its heap. These
segments are a fixed number of blocks in size, with each segment being
aligned to the segment size boundary. This allows us to easily find the
segment metadata stored at the beginning of the segment.
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The CI job fails with:
+++ rts/T13676.run/T13676.run.stderr.normalised 2019-10-09 12:27:56.000000000 -0700
@@ -0,0 +1,4 @@
+dyld: Library not loaded: @rpath/libHShaskeline-0.7.5.0-ghc8.9.0.20191009.dylib
+ Referenced from: /Users/builder/builds/ewzE5N2p/0/ghc/ghc/inplace/lib/bin/ghc
+ Reason: image not found
+*** Exception: readCreateProcess: '/Users/builder/builds/ewzE5N2p/0/ghc/ghc/inplace/lib/bin/ghc' '-B/Users/builder/builds/ewzE5N2p/0/ghc/ghc/inplace/lib' '-e' ''/''$'/'' == '/''/x0024'/''' +RTS '-tT13676.t' (exit -6): failed
Unable to reproduce locally.
|
| |
|
|
|
|
| |
Zeros heap memory after gc freed it.
|
|
|
|
|
|
|
|
| |
These are unexploded minds as far as the linter is concerned. I don't
want to hit in my MRs by mistake!
I did this with `sed`, and then rolled back some changes in the docs,
config.guess, and the linter itself.
|
|
|
|
|
|
| |
Previously T5423 would fail to flush the printf output buffer.
Consequently it was platform-dependent whether the C or Haskell print
output would be emitted first.
|
|
|
|
|
|
| |
Previously there were a few cases where operations like `omit_ways`
were incorrectly passed a single way (e.g. `omit_ways('threaded2')`).
This won't work as the author expected.
|
|
|
|
| |
omit_ways expects a list but this was broken in several cases.
|
|
|
|
|
| |
It times out pretty reliably. It's not clear that much is gained by
running this test in the ghci way anyways.
|
|
|
|
|
| |
It was previously marked as broken but it passes non-deterministically.
See #2783.
|
| |
|
|
|
|
|
|
|
|
|
| |
We are iterating through all object code for each heap objects when
checking whether object code can be unloaded. For large projects in
GHCi, this can be very expensive due to the large number of object code
that needs to be loaded/unloaded. To speed it up, this arrangess all
mapped sections of unloaded object code in a sorted array and use binary
search to check if an address location fall on them.
|
| |
|
| |
|
|
|
|
| |
Point users to the right URL
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Place each section on a separate page to ensure required
alignment (wastes lots ot space, needs to be improved).
2. Unwire relocation logic from macho sections (the most fiddly part
is adjusting internal relocations).
Other todos:
0. Add a test for section alignment.
1. Investigate 32bit relocations!
2. Fix memory leak in ZEROPAGE section allocation.
3. Fix creating redundant jump islands for GOT.
4. Investigate more compact section placement.
|
|
|
|
|
|
|
| |
This test, which is only run on Windows, seems to be reliably timing
out.
See #16390.
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
| |
Also, replace some tabs with spaces to avoid a "mixed indent" warning that vim
gives me.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the work stealing queue a load-load-barrier is required to ensure
that a read of queue data cannot be reordered before a read of the
bottom pointer into the queue.
The added load-load-barrier ensures that the ordering of writes enforced
at the end of `pushWSDeque` is also respected in the order of reads in
`stealWSDeque_`. In other words, when reading `q->bottom` we want to make
sure that we see the updates to `q->elements`.
Fixes #13633
|
|
|
|
| |
14586f5d removed this by accident.
|
|
|
|
| |
See Trac #15072, Trac #16349, Trac #16350
|
|
|
|
|
| |
This eliminates most uses of run_command in the testsuite in favor of the more
structured makefile_test.
|
|
|
|
| |
This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
|