| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
| |
So we did *not* have the stgCallocBytes prototype, and subsequently
the C compiler defaulted to `int` as a return value. Thus generating
sxtw instructions for the return value of stgCalloBytes to produce
the expected void *.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This drops allocateExec for darwin, and replaces it with
a alloc, write, mark executable strategy instead. This prevents
us from trying to allocate an executable range and then write to
it, which X^W will prohibit on darwin.
This will *only* work if we can use mmap.
|
|
|
|
| |
This is a pre-requisite for making aarch64-darwin work.
|
|
|
|
|
| |
Previously we were treating the thread ID as a HANDLE, but it is not. We
must first OpenThread.
|
|
|
|
|
|
|
|
| |
Previously `pathstat` relied on msvcrt's `stat` implementation, which was
not long-path-aware. It should rather be defined in terms of the `stat`
implementation provided by `utils/fs`.
Fixes #19541.
|
| |
|
| |
|
|
|
|
|
|
| |
tuples and sums.
fixes #1257
|
|
|
|
| |
This means it will be reposted everytime the eventlog is started.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Implement new debugger command `:ignore` to set an `ignore count`
for a specified breakpoint.
* Allow new optional parameter on `:continue` command to set an
`ignore count` for the current breakpoint.
* In the Interpreter replace the current `Word8` BreakArray with
an `Int` array.
* Change semantics of values in `BreakArray` to:
n < 0 : Breakpoint is disabled.
n == 0 : Breakpoint is enabled.
n > 0 : Breakpoint is enabled, but ignore next `n` iterations.
* Rewrite `:enable`/`:disable` processing as a special case of `:ignore`.
* Remove references to `BreakArray` from `ghc/UI.hs`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Related to #19381 #19359 #14702
After a spike in memory usage we have been conservative about returning
allocated blocks to the OS in case we are still allocating a lot and would
end up just reallocating them. The result of this was that up to 4 * live_bytes
of blocks would be retained once they were allocated even if memory usage ended up
a lot lower.
For a heap of size ~1.5G, this would result in OS memory reporting 6G which is
both misleading and worrying for users.
In long-lived server applications this results in consistent high memory
usage when the live data size is much more reasonable (for example ghcide)
Therefore we have a new (2021) strategy which starts by retaining up to 4 * live_bytes
of blocks before gradually returning uneeded memory back to the OS on subsequent
major GCs which are NOT caused by a heap overflow.
Each major GC which is NOT caused by heap overflow increases the consec_idle_gcs
counter and the amount of memory which is retained is inversely proportional to this number.
By default the excess memory retained is
oldGenFactor (controlled by -F) / 2 ^ (consec_idle_gcs * returnDecayFactor)
On a major GC caused by a heap overflow, the `consec_idle_gcs` variable is reset to 0
(as we could continue to allocate more, so retaining all the memory might make sense).
Therefore setting bigger values for `-Fd` makes the rate at which memory is returned slower.
Smaller values make it get returned faster. Setting `-Fd0` disables the
memory return completely, which is the behaviour of older GHC versions.
The default is `-Fd4` which results in the following scaling:
> mapM print [(x, 1/ (2**(x / 4))) | x <- [1 :: Double ..20]]
(1.0,0.8408964152537146)
(2.0,0.7071067811865475)
(3.0,0.5946035575013605)
(4.0,0.5)
(5.0,0.4204482076268573)
(6.0,0.35355339059327373)
(7.0,0.29730177875068026)
(8.0,0.25)
(9.0,0.21022410381342865)
(10.0,0.17677669529663687)
(11.0,0.14865088937534013)
(12.0,0.125)
(13.0,0.10511205190671433)
(14.0,8.838834764831843e-2)
(15.0,7.432544468767006e-2)
(16.0,6.25e-2)
(17.0,5.255602595335716e-2)
(18.0,4.4194173824159216e-2)
(19.0,3.716272234383503e-2)
(20.0,3.125e-2)
So after 13 consecutive GCs only 0.1 of the maximum memory used will be retained.
Further to this decay factor, the amount of memory we attempt to retain is
also influenced by the GC strategy for the oldest generation. If we are using
a copying strategy then we will need at least 2 * live_bytes for copying to take
place, so we always keep that much. If using compacting or nonmoving then we need a lower number,
so we just retain at least `1.2 * live_bytes` for some protection.
In future we might want to make this behaviour more aggressive, some
relevant literature is
> Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer. 2016. Idle time garbage collection scheduling. SIGPLAN Not. 51, 6 (June 2016), 570–583. DOI:https://doi.org/10.1145/2980983.2908106
which describes the "memory reducer" in the V8 javascript engine which
on an idle collection immediately returns as much memory as possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If startEventlog is called after the program has already started running
then quite a few useful events are missing from the eventlog because
they are only posted when the program starts. This patch adds a
mechanism to declare that an event should be reposted everytime the
startEventlog function is called.
Now in EventLog.c there is a global list of functions called
`eventlog_header_funcs` which stores a list of functions which should be
called everytime the eventlog starts.
When calling `postInitEvent`, the event will not only be immediately
posted to the eventlog but also added to the global list.
When startEventLog is called, the list is traversed and the events
reposted.
|
|
|
|
|
|
|
|
|
|
|
| |
The way in which allocatePinned took blocks out of the nursery was
leading to horrible fragmentation in some workloads.
The strategy now is that a separate free block list is reserved for each
capability and blocks are taken from there. When it's empty the global
SM lock is taken and a fresh block of size PINNED_EMPTY_SIZE is allocated.
Fixes #19481
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BLOCKS_SIZE event reports the size of the currently allocated blocks
in bytes.
It is like the HEAP_SIZE event, but reports about the blocks rather than
megablocks.
You can work out the current heap fragmentation by looking at the
difference between HEAP_SIZE and BLOCKS_SIZE.
Fixes #19357
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See #19357
The event reports the
* Current number of megablocks allocated
* The number that the RTS thinks it needs
* The number is managed to return to the OS
When current > need then the difference is returned to the OS, the
successful number of returned mblocks is reported by 'returned'.
In a fragmented heap current > need but returned < current - need.
|
|
|
|
| |
This enables a registerised build for the riscv64 architecture.
|
|
|
|
|
|
|
|
| |
markLiveObject is called by GC worker threads and therefore must be
thread-safe. This was a rather egregious oversight which the testsuite
missed.
(cherry picked from commit fe28a062e47bd914a6879f2d01ff268983c075ad)
|
|
|
|
|
|
|
|
|
|
| |
The `whereFrom` function provides a Haskell interface for using the
information created by `-finfo-table-map`. Given a Haskell value, the
info table address will be passed to the `lookupIPE` function in order
to attempt to find the source location information for that particular closure.
At the moment it's not possible to distinguish the absense of the map
and a failed lookup.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new flag embeds a lookup table from the address of an info table
to information about that info table.
The main interface for consulting the map is the `lookupIPE` C function
> InfoProvEnt * lookupIPE(StgInfoTable *info)
The `InfoProvEnt` has the following structure:
> typedef struct InfoProv_{
> char * table_name;
> char * closure_desc;
> char * ty_desc;
> char * label;
> char * module;
> char * srcloc;
> } InfoProv;
>
> typedef struct InfoProvEnt_ {
> StgInfoTable * info;
> InfoProv prov;
> struct InfoProvEnt_ *link;
> } InfoProvEnt;
The source positions are approximated in a similar way to the source
positions for DWARF debugging information. They are only approximate but
in our experience provide a good enough hint about where the problem
might be. It is therefore recommended to use this flag in conjunction
with `-g<n>` for more accurate locations.
The lookup table is also emitted into the eventlog when it is available
as it is intended to be used with the `-hi` profiling mode.
Using this flag will significantly increase the size of the resulting
object file but only by a factor of 2-3x in our experience.
|
|
|
|
|
|
|
|
| |
This profiling mode creates bands by the address of the info table for
each closure. This provides a much more fine-grained profiling output
than any of the other profiling modes.
The `-hi` profiling mode does not require a profiling build.
|
|
|
|
|
|
|
|
|
|
| |
This patch exposes three new functions in `GHC.Profiling` which allow
heap profiling to be enabled and disabled dynamically.
1. startHeapProfTimer - Starts heap profiling with the given RTS options
2. stopHeapProfTimer - Stops heap profiling
3. requestHeapCensus - Perform a heap census on the next context
switch, regardless of whether the timer is enabled or not.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the eventlog infrastructure had a couple of races that could
pop up when using the startEventLog/endEventLog interfaces. In
particular, stopping and then later restarting logging could result in
data preceding the eventlog header, breaking the integrity of the
stream.
To fix this we rework the invariants regarding the eventlog and
generally tighten up the concurrency control surrounding starting and
stopping of logging.
We also fix an unrelated bug, wherein log events from disabled
capabilities could end up never flushed.
|
|
|
|
|
|
|
| |
Previously flushEventLog failed to flush anything but the global event
buffer in the non-threaded RTS.
Fixes #19436.
|
|
|
|
|
|
|
|
|
|
|
| |
The previous approach performed the flush in yieldCapability. However,
as pointed out in #19435, this is wrong as it idle capabilities will not
go through this codepath.
The fix is simple: undo the optimisation, flushing in `flushEventLog` by
calling `flushAllCapsEventsBufs` after acquiring all capabilities.
Fixes #19435.
|
|
|
|
|
|
|
| |
It should be left to tooling to perform the filtering to remove these
specific closure types from the profile if desired.
Fixes #16795
|
|
|
|
|
|
|
| |
This introduces a flag, --eventlog-flush-interval, which can be used to
set an upper bound on the amount of time for which an eventlog event
will remain enqueued. This can be useful in real-time monitoring
settings.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using -fdicts-strict we generate references to absentError while
compiling ghc-prim. However we always load ghc-prim before base so this
caused linker errors.
We simply solve this by moving absentError into ghc-prim. This does mean
it's now a panic instead of an exception which can no longer be caught.
But given that it should only be thrown if there is a compiler error
that seems acceptable, and in fact we already do this for
absentSumFieldError which has similar constraints.
|
| |
|
|
|
|
| |
This is a well known technique to reduce inter-CPU bus traffic while waiting for the lock by reducing the number of writes.
|
|
|
|
|
|
|
|
|
| |
This function is exposed in the RtsAPI.h so that external users have a
blessed way to traverse all the different `bdescr`s which are known by
the RTS.
The main motivation is to use this function in ghc-debug but avoid
having to expose the internal structure of a Capability in the API.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simon's concern in the old comment, specifically:
So all of the calls to traverseMaybeInitClosureData() here are
initialising retainer sets with the wrong flip.
Is actually exactly what the code was intended to do. It makes the closure
data valid, then at the beginning of the traversal the flip bit is flipped
resetting all closures across the heap to invalid.
Now it used to be that the profiling code using the traversal has it's own
sense of valid vs. invalid beyond what the traversal code does and indeed
the retainer profiler still does this, there a getClosureData of NULL is
considered an invalid retainer set.
So in effect there wasn't any difference in invalidating closure data
rather than just resetting it to a valid zero, which might be what confused
Simon at the time.
As the code is now it actually uses the value of the valid/invalid bit in
the form of the 'first_visit' argument to the 'visit' callback so there is
a difference.
|
|
|
|
| |
GCC warns that varadic functions simply cannot be inlined.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
For now this just tests that the order of the callbacks is what we expect
for a couple of synthetic heap graphs.
|
|
|
|
|
|
|
| |
The point of this is to let user code call traversePushClosure directly
instead of going through traversePushRoot. This in turn allows specifying a
stackElement to be used when the traversal returns from a top-level (root)
closure.
|
| |
|
| |
|
|
|
|
|
| |
This allows the global 'flip' variable not to be exported. This allows a
future commit to also make it part of the traversalState struct.
|
|
|
|
|
| |
Having a union in the closure profiling header really just complicates
things so get back to basics, we just have a single StgWord there for now.
|
|
|
|
| |
data_out was renamed to child_data at some point
|