| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MacOS Catalina is finally going to force our hand in forbidden writable
exeutable mappings. Unfortunately, this is quite incompatible with the
current global m32 allocator, which mixes symbols from various objects
in a single page. The problem here is that some of these symbols may not
yet be resolved (e.g. had relocations performed) as this happens lazily
(and therefore we can't yet make the section read-only and therefore
executable).
The easiest way around this is to simply create one m32 allocator per
ObjectCode. This may slightly increase fragmentation for short-running
programs but I suspect will actually improve fragmentation for programs
doing lots of loading/unloading since we can always free all of the
pages allocated to an object when it is unloaded (although this ability
will only be implemented in a later patch).
|
| |
|
|
|
|
|
|
|
| |
Previously we would allow the output from the check of SMP support
introduced by 83655b06e6d3e93b2d15bb0fa250fbb113d7fe68 leak to
stdout. Silence this.
See #16873.
|
|
|
|
|
|
|
|
|
| |
We were using `appPrec`, not `sigPrec`, as the precedence when
determining whether or not to parenthesize `() :: Constraint`,
which lead to the parentheses being omitted in function contexts
like `(() :: Constraint) => String`. Easily fixed.
Fixes #17403.
|
| |
|
|
|
|
|
|
| |
This was removed in b538476be3706264620c072e6e436debf9e0d3e4, but
without it the compare-flags.py script fails. This adds it back and
marks it as deprecated, with a notice that it is slated for removal.
|
|
|
|
| |
It should point to the _build directory, not the source
|
| |
|
|
|
|
| |
This fixes a hadrian `build docs` failure
|
|
|
|
|
| |
These were probably added with some GLOBAL_VARs, but those GLOBAL_VARs
are now gone.
|
|
|
|
|
|
|
| |
* Prefer #pragma once over guard macros
* Drop redundant #includes
* Fix order to ensure that necessary macros are defined when we
condition on them
|
| |
|
|
|
|
|
| |
This is a unit test for the native code generator's register allocator;
naturally. the NCG is required.
|
| |
|
|
|
|
| |
[skip ci]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`:steplocal` enables only breakpoints in the current top-level binding.
When a normal breakpoint is hit, then the module name and the break id from the `BRK_FUN` byte code
allow us to access the corresponding entry in a ModBreak table. From this entry we then get the SrcSpan
(see compiler/main/InteractiveEval.hs:bindLocalsAtBreakpoint).
With this source-span we can then determine the current top-level binding, needed for the steplocal command.
However, if we break at an exception or at an error, we don't have an BRK_FUN byte-code, so we don't have any source information.
The function `bindLocalsAtBreakpoint` creates an `UnhelpfulSpan`, which doesn't allow us to determine the current top-level binding.
To avoid a `panic`, we have to check for `UnhelpfulSpan` in the function `ghc/GHCi/UI.hs:stepLocalCmd`.
Hence a :steplocal command after a break-on-exception or a break-on-error is not possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a part of GHC Proposal #25: "Offer more array resizing primitives".
Resources related to the proposal:
- Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121
- Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst
Only shrinkSmallMutableArray# is implemented as a primop since a
library-space implementation of resizeSmallMutableArray# (in GHC.Exts)
is no less efficient than a primop would be. This may be replaced by
a primop in the future if someone devises a strategy for growing
arrays in-place. The library-space implementation always copies the
array when growing it.
This commit also tweaks the documentation of the deprecated
sizeofMutableByteArray#, removing the mention of concurrency. That
primop is unsound even in single-threaded applications. Additionally,
the non-negativity assertion on the existing shrinkMutableByteArray#
primop has been removed since this predicate is trivially always true.
|
| |
|
|
|
|
|
| |
GCC 4.6 was released 7 years ago. I think we can finally assume that
it's available. This is a simplification prompted by #15742.
|
| |
|
|
|
|
| |
with s/16/32)
|
|
|
|
| |
It looks like this use of `skip` snuck through my previous refactoring.
|
|
|
|
| |
Due to #17018.
|
| |
|
|
|
|
|
|
|
|
|
| |
This commit makes Hadrian use the `-dynamic-too` flag when the current
Flavour's libraryWays contains both vanilla and dynamic, cutting down
the amount of repeated work caused by separate compilation of dynamic
and static files. It does this for the basic case where '.o' and
'.dyn_o' files are built with one command, but does not generalise to
cases like '.prof_o' and '.prof_dyn_o'.
|
|
|
|
|
| |
We applied a similar fix for `ConT` in #15572 but forgot to apply the
fix to `InfixT` as well. This patch fixes #17394 by doing just that.
|
|
|
|
|
|
|
|
|
|
|
|
| |
`isTcLevPoly` gives an approximate answer for when a type constructor
is levity polymorphic when fully applied, where `True` means
"possibly levity polymorphic" and `False` means "definitely not
levity polymorphic". `isTcLevPoly` returned `False` for newtypes,
which is incorrect in the presence of `UnliftedNewtypes`, leading
to #17360. This patch tweaks `isTcLevPoly` to return `True` for
newtypes instead.
Fixes #17360.
|
|
|
|
|
|
|
| |
We were using `pprIfaceAppArgs` instead of `pprParendIfaceAppArgs`
in `pprIfaceConDecl`. Oops.
Fixes #17384.
|
|
|
|
| |
See #16873.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This introduces a concurrent mark & sweep garbage collector to manage the old
generation. The concurrent nature of this collector typically results in
significantly reduced maximum and mean pause times in applications with large
working sets.
Due to the large and intricate nature of the change I have opted to
preserve the fully-buildable history, including merge commits, which is
described in the "Branch overview" section below.
Collector design
================
The full design of the collector implemented here is described in detail
in a technical note
> B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
> Compiler" (2018)
This document can be requested from @bgamari.
The basic heap structure used in this design is heavily inspired by
> K. Ueno & A. Ohori. "A fully concurrent garbage collector for
> functional programs on multicore processors." /ACM SIGPLAN Notices/
> Vol. 51. No. 9 (presented at ICFP 2016)
This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.
To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).
This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).
Implementation structure
========================
The majority of the collector is implemented in a handful of files:
* `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point
to the nonmoving collector (`nonmoving_collect`), as well as the allocator
(`nonmoving_allocate`) and a number of utilities for manipulating the heap.
* `rts/NonmovingMark.c` implements the mark queue functionality, update
remembered set, and mark loop.
* `rts/NonmovingSweep.c` implements the sweep loop.
* `rts/NonmovingScav.c` implements the logic necessary to scavenge the
nonmoving heap.
Branch overview
===============
```
* wip/gc/opt-pause:
| A variety of small optimisations to further reduce pause times.
|
* wip/gc/compact-nfdata:
| Introduce support for compact regions into the non-moving
|\ collector
| \
| \
| | * wip/gc/segment-header-to-bdescr:
| | | Another optimization that we are considering, pushing
| | | some segment metadata into the segment descriptor for
| | | the sake of locality during mark
| | |
| * | wip/gc/shortcutting:
| | | Support for indirection shortcutting and the selector optimization
| | | in the non-moving heap.
| | |
* | | wip/gc/docs:
| |/ Work on implementation documentation.
| /
|/
* wip/gc/everything:
| A roll-up of everything below.
|\
| \
| |\
| | \
| | * wip/gc/optimize:
| | | A variety of optimizations, primarily to the mark loop.
| | | Some of these are microoptimizations but a few are quite
| | | significant. In particular, the prefetch patches have
| | | produced a nontrivial improvement in mark performance.
| | |
| | * wip/gc/aging:
| | | Enable support for aging in major collections.
| | |
| * | wip/gc/test:
| | | Fix up the testsuite to more or less pass.
| | |
* | | wip/gc/instrumentation:
| | | A variety of runtime instrumentation including statistics
| | / support, the nonmoving census, and eventlog support.
| |/
| /
|/
* wip/gc/nonmoving-concurrent:
| The concurrent write barriers.
|
* wip/gc/nonmoving-nonconcurrent:
| The nonmoving collector without the write barriers necessary
| for concurrent collection.
|
* wip/gc/preparation:
| A merge of the various preparatory patches that aren't directly
| implementing the GC.
|
|
* GHC HEAD
.
.
.
```
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously we would first move the new objects to their appropriate
non-moving GC list, then do another pass over that list to clear their
mark bits. This is needlessly expensive. First clear the mark bits of
the existing objects, then add the newly evacuated objects and, at the
same time, clear their mark bits.
This cuts the preparatory GC time in half for the Pusher benchmark with
a large queue size.
|
| | |
|
| |
| |
| |
| |
| |
| | |
The expectation here is that the nonmoving GC is latency-centric,
whereas the moving GC emphasizes throughput. Therefore we give the
latter the benefit of better static branch prediction.
|
| |
| |
| |
| |
| |
| | |
This largely follows the model used for large objects, with appropriate
adjustments made to account for references in the sharing deduplication
hashtable.
|
| |\ \
| | | |
| | | |
| | | | |
wip/gc/everything2
|
| | | | |
|
| | | | |
|
| | | | |
|
| | |/
| | |
| | |
| | | |
This will allow us to easily move the block size elsewhere.
|
| | | |
|
| | | |
|
| | | |
|
| |/
| |
| |
| |
| | |
This allows indirection chains residing in the non-moving heap to be
shorted-out.
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | | |
This is consistent with the other unoptimized ways.
|
| | | |
| | | |
| | | |
| | | | |
The nonmoving way finalizes things in a different order.
|
| | | |
| | | |
| | | |
| | | | |
The nonmoving collector doesn't support -G1
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The debugged RTS initializes the heap with 0xaa, which breaks the
(admittedly rather fragile) assumption that uninitialized fields are set
to 0x00:
```
Wrong exit code for heap_all(nonmoving)(expected 0 , actual 1 )
Stderr ( heap_all ):
heap_all: user error (assertClosuresEq: Closures do not match
Expected: FunClosure {info = StgInfoTable {entry = Nothing, ptrs = 0, nptrs = 1, tipe = FUN_0_1, srtlen = 0, code = Nothing}, ptrArgs = [], dataArgs = [0]}
Actual: FunClosure {info = StgInfoTable {entry = Nothing, ptrs = 0, nptrs = 1, tipe = FUN_0_1, srtlen = 1032832, code = Nothing}, ptrArgs = [], dataArgs = [12297829382473034410]}
CallStack (from HasCallStack):
assertClosuresEq, called at heap_all.hs:230:9 in main:Main
)
```
|