| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
The code is just more confusing than it needs to be. We don't need to mix
the threaded check with the ldv profiling check since ldv's init already
checks for this. Hence they can be two separate checks. Taking the sanity
checking into account is also cleaner via DebugFlags.sanity. No need for
checking the DEBUG define.
The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the
old code had also make things a lot more opaque IMO so I removed those.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This introduces a concurrent mark & sweep garbage collector to manage the old
generation. The concurrent nature of this collector typically results in
significantly reduced maximum and mean pause times in applications with large
working sets.
Due to the large and intricate nature of the change I have opted to
preserve the fully-buildable history, including merge commits, which is
described in the "Branch overview" section below.
Collector design
================
The full design of the collector implemented here is described in detail
in a technical note
> B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
> Compiler" (2018)
This document can be requested from @bgamari.
The basic heap structure used in this design is heavily inspired by
> K. Ueno & A. Ohori. "A fully concurrent garbage collector for
> functional programs on multicore processors." /ACM SIGPLAN Notices/
> Vol. 51. No. 9 (presented at ICFP 2016)
This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.
To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).
This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).
Implementation structure
========================
The majority of the collector is implemented in a handful of files:
* `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point
to the nonmoving collector (`nonmoving_collect`), as well as the allocator
(`nonmoving_allocate`) and a number of utilities for manipulating the heap.
* `rts/NonmovingMark.c` implements the mark queue functionality, update
remembered set, and mark loop.
* `rts/NonmovingSweep.c` implements the sweep loop.
* `rts/NonmovingScav.c` implements the logic necessary to scavenge the
nonmoving heap.
Branch overview
===============
```
* wip/gc/opt-pause:
| A variety of small optimisations to further reduce pause times.
|
* wip/gc/compact-nfdata:
| Introduce support for compact regions into the non-moving
|\ collector
| \
| \
| | * wip/gc/segment-header-to-bdescr:
| | | Another optimization that we are considering, pushing
| | | some segment metadata into the segment descriptor for
| | | the sake of locality during mark
| | |
| * | wip/gc/shortcutting:
| | | Support for indirection shortcutting and the selector optimization
| | | in the non-moving heap.
| | |
* | | wip/gc/docs:
| |/ Work on implementation documentation.
| /
|/
* wip/gc/everything:
| A roll-up of everything below.
|\
| \
| |\
| | \
| | * wip/gc/optimize:
| | | A variety of optimizations, primarily to the mark loop.
| | | Some of these are microoptimizations but a few are quite
| | | significant. In particular, the prefetch patches have
| | | produced a nontrivial improvement in mark performance.
| | |
| | * wip/gc/aging:
| | | Enable support for aging in major collections.
| | |
| * | wip/gc/test:
| | | Fix up the testsuite to more or less pass.
| | |
* | | wip/gc/instrumentation:
| | | A variety of runtime instrumentation including statistics
| | / support, the nonmoving census, and eventlog support.
| |/
| /
|/
* wip/gc/nonmoving-concurrent:
| The concurrent write barriers.
|
* wip/gc/nonmoving-nonconcurrent:
| The nonmoving collector without the write barriers necessary
| for concurrent collection.
|
* wip/gc/preparation:
| A merge of the various preparatory patches that aren't directly
| implementing the GC.
|
|
* GHC HEAD
.
.
.
```
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This extends the non-moving collector to allow concurrent collection.
The full design of the collector implemented here is described in detail
in a technical note
B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
Compiler" (2018)
This extension involves the introduction of a capability-local
remembered set, known as the /update remembered set/, which tracks
objects which may no longer be visible to the collector due to mutation.
To maintain this remembered set we introduce a write barrier on
mutations which is enabled while a concurrent mark is underway.
The update remembered set representation is similar to that of the
nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
`Capability` maintains a single accumulator chunk, which it flushed
when it (a) is filled, or (b) when the nonmoving collector enters its
post-mark synchronization phase.
While the write barrier touches a significant amount of code it is
conceptually straightforward: the mutator must ensure that the referee
of any pointer it overwrites is added to the update remembered set.
However, there are a few details:
* In the case of objects with a dirty flag (e.g. `MVar`s) we can
exploit the fact that only the *first* mutation requires a write
barrier.
* Weak references, as usual, complicate things. In particular, we must
ensure that the referee of a weak object is marked if dereferenced by
the mutator. For this we (unfortunately) must introduce a read
barrier, as described in Note [Concurrent read barrier on deRefWeak#]
(in `NonMovingMark.c`).
* Stable names are also a bit tricky as described in Note [Sweeping
stable names in the concurrent collector] (`NonMovingSweep.c`).
We take quite some pains to ensure that the high thread count often seen
in parallel Haskell applications doesn't affect pause times. To this end
we allow thread stacks to be marked either by the thread itself (when it
is executed or stack-underflows) or the concurrent mark thread (if the
thread owning the stack is never scheduled). There is a non-trivial
handshake to ensure that this happens without racing which is described
in Note [StgStack dirtiness flags and concurrent marking].
Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
|
| |
| |
| |
| |
| |
| |
| | |
Namely ensure that block descriptors are initialized with valid
generation numbers.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|/ |
|
|
|
|
|
|
|
|
|
| |
This implements the correct fix for #11627 by skipping over the slop
(which is zeroed) rather than adding special case logic for LARGE
ARR_WORDS which runs the risk of not performing a correct census by
ignoring any subsequent blocks.
This approach implements similar logic to that in Sanity.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This:
- Hoists part of the condition outside of the initialization loop in
`stg_newSmallArrayzh`.
- Annotates one of the unlikely branches as unlikely, also in
`stg_newSmallArrayzh`.
- Adds a couple of annotations to `allocateMightFail` indicating which
branches are likely to be taken.
Together this gives about 5% improvement.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves all URL references to Trac Wiki to their corresponding
GitLab counterparts.
This substitution is classified as follows:
1. Automated substitution using sed with Ben's mapping rule [1]
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
2. Manual substitution for URLs containing `#` index
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
3. Manual substitution for strings starting with `Commentary`
Old: Commentary/XxxYyy...
New: commentary/xxx-yyy...
See also !539
[1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Long ago, the stable name table and stable pointer tables were one.
Now, they are separate, and have significantly different
implementations. I believe the time has come to finish the split
that began in #7674.
* Divide `rts/Stable` into `rts/StableName` and `rts/StablePtr`.
* Give each table its own mutex.
* Add FFI functions `hs_lock_stable_ptr_table` and
`hs_unlock_stable_ptr_table` and document them.
These are intended to replace the previously undocumented
`hs_lock_stable_tables` and `hs_lock_stable_tables`,
which are now documented as deprecated synonyms.
* Make `eqStableName#` use pointer equality instead of unnecessarily
comparing stable name table indices.
Reviewers: simonmar, bgamari, erikd
Reviewed By: bgamari
Subscribers: rwbarton, carter
GHC Trac Issues: #15555
Differential Revision: https://phabricator.haskell.org/D5084
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pulls parts of Joachim Breitner's ghc-heap-view library inside GHC.
The bits added are the C hooks into the RTS and a basic Haskell wrapper
to these C hooks. The main reason for these to be added to GHC proper
is that the code needs to be kept in sync with the closure types
defined by the RTS. It is expected that the version of HeapView shipped
with GHC will always work with that version of GHC and that extra
functionality can be layered on top with a library like ghc-heap-view
distributed via Hackage.
Test Plan: validate
Reviewers: simonmar, hvr, nomeata, austin, Phyx, bgamari, erikd
Reviewed By: bgamari
Subscribers: carter, patrickdoc, tmcgilchrist, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3055
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we encode the cost centre list as static data. This means that the
initialization stubs are small functions which should be easy for GCC to
compile, even with optimization.
Fixes #7960.
Test Plan: Test profiling
Reviewers: austin, erikd, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie
GHC Trac Issues: #7960
Differential Revision: https://phabricator.haskell.org/D3853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Get utf8 encoded arguments before we call hs_init and use them
instead of ignoring hs_init arguments. This reduces differing
behaviour of the RTS between windows and linux and simplifies
the code involved.
A few testcases were changed to expect the same result on windows
as on linux after the changes.
This fixes #13940.
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar, Phyx
Subscribers: Phyx, rwbarton, thomie
GHC Trac Issues: #13940
Differential Revision: https://phabricator.haskell.org/D3739
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This both says what we mean and silences a bunch of spurious CPP linting
warnings. This pragma is supported by all CPP implementations which we
support.
Reviewers: austin, erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3482
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we throw an exception for heap overflow, we should only print
the heap overflow message in the main thread when the HeapOverflow
exception is caught, rather than as a side effect in the GC.
Stack overflows were already done this way, I just made heap overflow
consistent with stack overflow, and did some related cleanup.
Fixes broken T2592(profasm) which was reporting the heap overflow
message twice (you would only notice when building with profiling
libs enabled).
Test Plan: validate
Reviewers: bgamari, niteria, austin, DemiMarie, hvr, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3394
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces calls to barf() in loadArchive() with proper
error handling.
Test Plan: GHC CI
Reviewers: rwbarton, erikd, hvr, austin, simonmar, bgamari
Reviewed By: bgamari
Subscribers: thomie
Tags: #ghc
Differential Revision: https://phabricator.haskell.org/D2652
GHC Trac Issues: #12388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Visible API changes:
* The C struct `GCDetails` gives the stats about a single GC. This is
passed to the `gcDone()` callback if one is set via the
RtsConfig. (previously we just passed a collection of values, so this
is more extensible, at the expense of breaking the existing API)
* `RTSStats` gives cumulative stats since the start of the program,
and includes the `GCDetails` for the most recent GC. This struct
can be obtained via `getRTSStats()` (the old `getGCStats()` has been
removed, and `getGCStatsEnabled()` has been renamed to
`getRTSStatsEnabled()`)
Improvements:
* The per-GC stats and cumulative stats are now cleanly separated.
* Inside the RTS we have a top-level `RTSStats` struct to keep all our
stats in, previously this was just a collection of strangely-named
variables. This struct is mostly just copied in `getRTSStats()`, so
the implementation of that function is a lot shorter.
* Types are more consistent. We use a uint64_t byte count for all
memory values, and Time for all time values.
* Names are more consistent. We use a suffix `_bytes` for all byte
counts and `_ns` for all time values.
* We now collect information about the amount of memory in large
objects and compact objects in `GCDetails`. (the latter was the reason
I started doing this patch but it seems to have ballooned a bit!)
* I fixed a bug in the calculation of the elapsed MUT time, and added
an ASSERT to stop the calculations going wrong in the future.
For now I kept the Haskell API in `GHC.Stats` the same, by
impedence-matching with the new API. We could either break that API
and make it match the C API more closely, or we could add a new API
and deprecate the old one. Opinions welcome.
This stuff is very easy to get wrong, and it's hard to test. Reviews
welcome!
Test Plan:
manual testing
validate
Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2756
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch refactors GNU C version test (for 4.5 and more modern)
due to usage of __builtin_unreachable done in the CNF.c code directly
into the new RTS_UNREACHABLE macro placed into Rts.h
Reviewers: bgamari, austin, simonmar, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2457
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This macro is doubly redundant, first off all, ancient GCCs prior to
version 3.0 are not supported anymore, but more importantly, we require
a ISO C99 compliant compiler, so we can use the proper ISO C syntax
without worrying about compatibility.
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: carter, thomie
Differential Revision: https://phabricator.haskell.org/D2121
|
|
|
|
| |
Differential Revision: https://phabricator.haskell.org/D1198#40948
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds basic support to the RTS for DWARF-assisted unwinding of the
Haskell and C stack via libdw. This only adds the infrastructure;
consumers of this functionality will be introduced in future diffs.
Currently we are carrying the initial register collection code in
Libdw.c but this will eventually make its way upstream to libdw.
Test Plan: See future patches
Reviewers: Tarrasch, scpmw, austin, simonmar
Reviewed By: austin, simonmar
Subscribers: simonmar, thomie, erikd
Differential Revision: https://phabricator.haskell.org/D1196
GHC Trac Issues: #10656
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Hooks rely on static linking semantics, and are broken by -Bsymbolic
which we need when using dynamic linking.
Test Plan: Built it
Reviewers: austin, hvr, tibbe
Differential Revision: https://phabricator.haskell.org/D8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As proposed in [1], this extension introduces a new syntactic form
`static e`, where `e :: a` can be any closed expression. The static form
produces a value of type `StaticPtr a`, which works as a reference that
programs can "dereference" to get the value of `e` back. References are
like `Ptr`s, except that they are stable across invocations of a
program.
The relevant wiki pages are [2, 3], which describe the motivation/ideas
and implementation plan respectively.
[1] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards
Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011. ISSN
0362-1340.
[2] https://ghc.haskell.org/trac/ghc/wiki/StaticPointers
[3] https://ghc.haskell.org/trac/ghc/wiki/StaticPointers/ImplementationPlan
Authored-by: Facundo Domínguez <facundo.dominguez@tweag.io>
Authored-by: Mathieu Boespflug <m@tweag.io>
Authored-by: Alexander Vershilov <alexander.vershilov@tweag.io>
Test Plan: `./validate`
Reviewers: hvr, simonmar, simonpj, austin
Reviewed By: simonpj, austin
Subscribers: qnikst, bgamari, mboes, carter, thomie, goldfire
Differential Revision: https://phabricator.haskell.org/D550
GHC Trac Issues: #7015
|
|
|
|
|
|
|
| |
This reverts commit 35672072b4091d6f0031417bc160c568f22d0469.
Conflicts:
compiler/main/DriverPipeline.hs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In preparation for indirecting all references to closures,
we rename _closure to _static_closure to ensure any old code
will get an undefined symbol error. In order to reference
a closure foobar_closure (which is now undefined), you should instead
use STATIC_CLOSURE(foobar). For convenience, a number of these
old identifiers are macro'd.
Across C-- and C (Windows and otherwise), there were differing
conventions on whether or not foobar_closure or &foobar_closure
was the address of the closure. Now, all foobar_closure references
are addresses, and no & is necessary.
CHARLIKE/INTLIKE were not changed, simply alpha-renamed.
Part of remove HEAP_ALLOCED patch set (#8199)
Depends on D265
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Test Plan: validate
Reviewers: simonmar, austin
Subscribers: simonmar, ezyang, carter, thomie
Differential Revision: https://phabricator.haskell.org/D267
GHC Trac Issues: #8199
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
| |
This requires that stackOverflow() in RtsUtils.c be passed a reference
to the current TSO. This requires a small change in libraries/base.
|
| |
|
| |
|
|
|
|
| |
This reverts commit d85044f6b201eae0a9e453b89c0433608e0778f0.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When servicing a stack overflows, only throw an exception to the given
thread if the user explicitly set a max stack size, using +RTS -K.
Otherwise just service it normally and grow the stack.
In case we actually run out of *heap* (stack chuncks are allocated on
the heap), then we need to bail by calling the stackOverflow() hook and
exit immediately.
Authored-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
| |
Based on a patch from Yuras Shumovich.
|
| |
|
|
|
|
| |
Fixes T3807 on OS X 32.
|
|
|
|
| |
It makes sanity-checking fail.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change here is that the Cmm parser now allows high-level cmm
code with argument-passing and function calls. For example:
foo ( gcptr a, bits32 b )
{
if (b > 0) {
// we can make tail calls passing arguments:
jump stg_ap_0_fast(a);
}
return (x,y);
}
More details on the new cmm syntax are in Note [Syntax of .cmm files]
in CmmParse.y.
The old syntax is still more-or-less supported for those occasional
code fragments that really need to explicitly manipulate the stack.
However there are a couple of differences: it is now obligatory to
give a list of live GlobalRegs on every jump, e.g.
jump %ENTRY_CODE(Sp(0)) [R1];
Again, more details in Note [Syntax of .cmm files].
I have rewritten most of the .cmm files in the RTS into the new
syntax, except for AutoApply.cmm which is generated by the genapply
program: this file could be generated in the new syntax instead and
would probably be better off for it, but I ran out of enthusiasm.
Some other changes in this batch:
- The PrimOp calling convention is gone, primops now use the ordinary
NativeNodeCall convention. This means that primops and "foreign
import prim" code must be written in high-level cmm, but they can
now take more than 10 arguments.
- CmmSink now does constant-folding (should fix #7219)
- .cmm files now go through the cmmPipeline, and as a result we
generate better code in many cases. All the object files generated
for the RTS .cmm files are now smaller. Performance should be
better too, but I haven't measured it yet.
- RET_DYN frames are removed from the RTS, lots of code goes away
- we now have some more canned GC points to cover unboxed-tuples with
2-4 pointers, which will reduce code size a little.
|
| |
|
| |
|
|
|
|
|
| |
It turns out that we can use %zu and %llu on Win32, provided we
include PosixSource everywhere we want to use them.
|
|
|
|
|
| |
On Win32 it's not recognised, so we unfortunately can't use it
unconditionally.
|
|
|
|
|
|
| |
Mostly this meant getting pointer<->int conversions to use the right
sizes. lnat is now size_t, rather than unsigned long, as that seems a
better match for how it's used.
|
| |
|
|
|
|
|
| |
It was assuming that long's are word-sized, which is not the case on
Win64.
|
| |
|
|
|
|
|
|
|
| |
I don't think it was causing any problems, but
TimeToUS(x+y)
would have evaluated to
x + (y / 1000)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Terminology cleanup: the type "Ticks" has been renamed "Time", which
is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds).
The terminology "tick" is now used consistently to mean the interval
between timer signals.
The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if
we have it). Before it used CPU time in the non-threaded RTS and
realtime in the threaded RTS, but I've discovered that the CPU timer
has terrible resolution (at least on Linux) and isn't much use for
profiling. So now we always use realtime. This should also fix
The default tick interval is now 10ms, except when profiling where we
drop it to 1ms. This gives more accurate profiles without affecting
runtime too much (<1%).
Lots of cleanups - the resolution of Time is now in one place
only (Rts.h) rather than having calculations that depend on the
resolution scattered all over the RTS. I hope I found them all.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than have main() be statically compiled as part of the RTS, we
now generate it into the tiny C file that we compile when linking a
binary.
The main motivation is that we want to pass the settings for the
-rtsotps and -with-rtsopts flags into the RTS, rather than relying on
fragile linking semantics to override the defaults, which don't work
with DLLs on Windows (#5373). In order to do this, we need to extend
the API for initialising the RTS, so now we have:
void hs_init_ghc (int *argc, char **argv[], // program arguments
RtsConfig rts_config); // RTS configuration
hs_init_ghc() can optionally be used instead of hs_init(), and allows
passing in configuration options for the RTS. RtsConfig is a struct,
which currently has two fields:
typedef struct {
RtsOptsEnabledEnum rts_opts_enabled;
const char *rts_opts;
} RtsConfig;
but might have more in the future. There is a default value for the
struct, defaultRtsConfig, the idea being that you start with this and
override individual fields as necessary.
In fact, main() was in a separate static library, libHSrtsmain.a.
That's now gone.
|