| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
- No symlinking for sake of windows, so we hard-link at configure time.
- `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
being moved to `compiler/`) so as not to confuse anyone, since it is
next to Haskell files.
- Blanket `-Iincludes` is gone in both build systems, include paths now
more strictly respect per-package dependencies.
- `deriveConstants` has been taught to not require a `--target-os` flag
when generating the platform-agnostic Haskell type. Make takes
advantage of this, but Hadrian has yet to.
|
|
|
|
|
|
| |
is used outside of the rts so we do this rather than just fish it out of
the repo in ad-hoc way, in order to make packages in this repo more
self-contained.
|
|
|
|
| |
I need to do this now or when I move these files the linter will be mad.
|
|
|
|
|
|
| |
Sized arrays cannot be used in headers that might be imported from C++.
Fixes #20199.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously `timedWaitCondition` assumed that timeouts were referenced
against `CLOCK_MONOTONIC`. This is wrong; by default
`pthread_cond_timedwait` references against `CLOCK_REALTIME`, although
this can be overridden using `pthread_condattr_setclock`.
Fix this and add support for using `CLOCK_MONOTONIC` whenever possible
as it is more robust against system time changes and is likely cheaper
to query. Unfortunately, this is complicated by the fact that older
versions of Darwin did not provide `clock_gettime`, which means we also
need to introduce a fallback path using `gettimeofday`.
Fixes #20144.
|
|
|
|
|
|
|
|
|
|
|
| |
PPC NCG: Implement CAS inline for 32 and 64 bit
testsuite: Add tests for smaller atomic CAS
X86 NCG: Catch calls to CAS C fallback
Primops: Add atomicCasWord[8|16|32|64]Addr#
Add tests for atomicCasWord[8|16|32|64]Addr#
Add changelog entry for new primops
X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch
ghc-prim: 64-bit CAS C fallback on all archs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Noticed build failures like
```
ghc-stage1: panic! (the 'impossible' happened)
GHC version 9.3.20210721:
pprCallishMachOp_for_C: MO_x64_Ne not supported!
```
on `--tagget=hppa2.0-unknown-linux-gnu`.
The change does not fix all 32-bit unreg target problems,
but at least allows linking final ghc binaries.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
|
|
| |
Running the test suite with asserts enabled is somewhat tricky at the
moment as running it with a GHC compiled the DEBUG way has some hundred
failures from the start. These seem to be unrelated to assertions
though. So this provides a toggle to make it easier to debug failing
assertions using the test suite.
|
|
|
|
| |
Fixes #20160.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we relied on the caller to check the return value from
broadcastCondition and friends, most of whom neglected to do so. Given
that these functions should not fail anyways, I've opted to drop the
return value entirely and rather move the result check into the
OSThreads functions.
This slightly changes the semantics of timedWaitCondition which now
returns false only in the case of timeout, rather than any error as
previously done.
|
|
|
|
| |
Previously we would only catch EDEADLK errors.
|
|
|
|
| |
All uses of these now use ExecPage.
|
|
|
|
|
| |
Here we introduce a very thin abstraction for allocating, filling, and
freezing executable pages to replace allocateExec.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In which we add a new code generator to the Glasgow Haskell
Compiler. This codegen supports ELF and Mach-O targets, thus covering
Linux, macOS, and BSDs in principle. It was tested only on macOS and
Linux. The NCG follows a similar structure as the other native code
generators we already have, and should therfore be realtively easy to
follow.
It supports most of the features required for a proper native code
generator, but does not claim to be perfect or fully optimised. There
are still opportunities for optimisations.
Metric Decrease:
ManyAlternatives
ManyConstructors
MultiLayerModules
PmSeriesG
PmSeriesS
PmSeriesT
PmSeriesV
T10421
T10421a
T10858
T11195
T11276
T11303b
T11374
T11822
T12227
T12545
T12707
T13035
T13253
T13253-spj
T13379
T13701
T13719
T14683
T14697
T15164
T15630
T16577
T17096
T17516
T17836
T17836b
T17977
T17977b
T18140
T18282
T18304
T18478
T18698a
T18698b
T18923
T1969
T3064
T5030
T5321FD
T5321Fun
T5631
T5642
T5837
T783
T9198
T9233
T9630
T9872d
T9961
WWRec
Metric Increase:
T4801
|
| |
|
|
|
|
|
|
|
| |
The __BSD_VISIBLE and _DARWIN_C_SOURCE macros expose non-POSIX prototypes in
system header files. We should scope these to just the ".c" modules that
actually need them, and avoid defining them in header files used in other C
modules.
|
| |
|
|
|
|
| |
It isn't used for anything anymore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dynamic-by-default was a mechanism to automatically select the -dynamic
way for some targets.
It was implemented in a convoluted way: it was defined as a flavour
option, hence it couldn't be passed as a global settings (which are
produced by `configure` before considering flavours), so a build system
rule was used to pass -DDYNAMIC_BY_DEFAULT to the C compiler so that
deriveConstants could infer it.
* Make build system has it disabled for 8 years (951e28c0625ece7e0db6ac9d4a1e61e2737b10de)
* It has never been implemented in Hadrian
* Last time someone tried to enable it 1 year ago it didn't work (!2436)
* Having this as a global constant impedes making GHC multi-target (see !5427)
This commit fully removes support for dynamic-by-default. If someone
wants to reimplement something like this, it would probably need to move
the logic in the compiler.
(Doing this would probably need some refactoring of the way the compiler
handles DynFlags: DynFlags are used to store and to pass enabled ways to
many parts of the compiler. It can be set by command-line flags, GHC
API, global settings. In multi-target GHC, we will use DynFlags to load
the target platform and its constants: but at this point with the
current DynFlags implementation we can't easily update the existing
DynFlags with target-specific options such as dynamic-by-default without
overriding ways previously set by the user.)
|
|
|
|
|
| |
Metric Increase:
T11276
|
|
|
|
|
|
|
|
|
| |
This drops allocateExec for darwin, and replaces it with
a alloc, write, mark executable strategy instead. This prevents
us from trying to allocate an executable range and then write to
it, which X^W will prohibit on darwin.
This will *only* work if we can use mmap.
|
|
|
|
|
|
| |
tuples and sums.
fixes #1257
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Related to #19381 #19359 #14702
After a spike in memory usage we have been conservative about returning
allocated blocks to the OS in case we are still allocating a lot and would
end up just reallocating them. The result of this was that up to 4 * live_bytes
of blocks would be retained once they were allocated even if memory usage ended up
a lot lower.
For a heap of size ~1.5G, this would result in OS memory reporting 6G which is
both misleading and worrying for users.
In long-lived server applications this results in consistent high memory
usage when the live data size is much more reasonable (for example ghcide)
Therefore we have a new (2021) strategy which starts by retaining up to 4 * live_bytes
of blocks before gradually returning uneeded memory back to the OS on subsequent
major GCs which are NOT caused by a heap overflow.
Each major GC which is NOT caused by heap overflow increases the consec_idle_gcs
counter and the amount of memory which is retained is inversely proportional to this number.
By default the excess memory retained is
oldGenFactor (controlled by -F) / 2 ^ (consec_idle_gcs * returnDecayFactor)
On a major GC caused by a heap overflow, the `consec_idle_gcs` variable is reset to 0
(as we could continue to allocate more, so retaining all the memory might make sense).
Therefore setting bigger values for `-Fd` makes the rate at which memory is returned slower.
Smaller values make it get returned faster. Setting `-Fd0` disables the
memory return completely, which is the behaviour of older GHC versions.
The default is `-Fd4` which results in the following scaling:
> mapM print [(x, 1/ (2**(x / 4))) | x <- [1 :: Double ..20]]
(1.0,0.8408964152537146)
(2.0,0.7071067811865475)
(3.0,0.5946035575013605)
(4.0,0.5)
(5.0,0.4204482076268573)
(6.0,0.35355339059327373)
(7.0,0.29730177875068026)
(8.0,0.25)
(9.0,0.21022410381342865)
(10.0,0.17677669529663687)
(11.0,0.14865088937534013)
(12.0,0.125)
(13.0,0.10511205190671433)
(14.0,8.838834764831843e-2)
(15.0,7.432544468767006e-2)
(16.0,6.25e-2)
(17.0,5.255602595335716e-2)
(18.0,4.4194173824159216e-2)
(19.0,3.716272234383503e-2)
(20.0,3.125e-2)
So after 13 consecutive GCs only 0.1 of the maximum memory used will be retained.
Further to this decay factor, the amount of memory we attempt to retain is
also influenced by the GC strategy for the oldest generation. If we are using
a copying strategy then we will need at least 2 * live_bytes for copying to take
place, so we always keep that much. If using compacting or nonmoving then we need a lower number,
so we just retain at least `1.2 * live_bytes` for some protection.
In future we might want to make this behaviour more aggressive, some
relevant literature is
> Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer. 2016. Idle time garbage collection scheduling. SIGPLAN Not. 51, 6 (June 2016), 570–583. DOI:https://doi.org/10.1145/2980983.2908106
which describes the "memory reducer" in the V8 javascript engine which
on an idle collection immediately returns as much memory as possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BLOCKS_SIZE event reports the size of the currently allocated blocks
in bytes.
It is like the HEAP_SIZE event, but reports about the blocks rather than
megablocks.
You can work out the current heap fragmentation by looking at the
difference between HEAP_SIZE and BLOCKS_SIZE.
Fixes #19357
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See #19357
The event reports the
* Current number of megablocks allocated
* The number that the RTS thinks it needs
* The number is managed to return to the OS
When current > need then the difference is returned to the OS, the
successful number of returned mblocks is reported by 'returned'.
In a fragmented heap current > need but returned < current - need.
|
|
|
|
| |
This enables a registerised build for the riscv64 architecture.
|
|
|
|
|
|
|
|
|
|
| |
The `whereFrom` function provides a Haskell interface for using the
information created by `-finfo-table-map`. Given a Haskell value, the
info table address will be passed to the `lookupIPE` function in order
to attempt to find the source location information for that particular closure.
At the moment it's not possible to distinguish the absense of the map
and a failed lookup.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new flag embeds a lookup table from the address of an info table
to information about that info table.
The main interface for consulting the map is the `lookupIPE` C function
> InfoProvEnt * lookupIPE(StgInfoTable *info)
The `InfoProvEnt` has the following structure:
> typedef struct InfoProv_{
> char * table_name;
> char * closure_desc;
> char * ty_desc;
> char * label;
> char * module;
> char * srcloc;
> } InfoProv;
>
> typedef struct InfoProvEnt_ {
> StgInfoTable * info;
> InfoProv prov;
> struct InfoProvEnt_ *link;
> } InfoProvEnt;
The source positions are approximated in a similar way to the source
positions for DWARF debugging information. They are only approximate but
in our experience provide a good enough hint about where the problem
might be. It is therefore recommended to use this flag in conjunction
with `-g<n>` for more accurate locations.
The lookup table is also emitted into the eventlog when it is available
as it is intended to be used with the `-hi` profiling mode.
Using this flag will significantly increase the size of the resulting
object file but only by a factor of 2-3x in our experience.
|
|
|
|
|
|
|
|
| |
This profiling mode creates bands by the address of the info table for
each closure. This provides a much more fine-grained profiling output
than any of the other profiling modes.
The `-hi` profiling mode does not require a profiling build.
|
|
|
|
|
|
|
|
|
|
| |
This patch exposes three new functions in `GHC.Profiling` which allow
heap profiling to be enabled and disabled dynamically.
1. startHeapProfTimer - Starts heap profiling with the given RTS options
2. stopHeapProfTimer - Stops heap profiling
3. requestHeapCensus - Perform a heap census on the next context
switch, regardless of whether the timer is enabled or not.
|
|
|
|
| |
Fixes #17953
|
| |
|
|
|
|
|
|
|
| |
It should be left to tooling to perform the filtering to remove these
specific closure types from the profile if desired.
Fixes #16795
|
|
|
|
|
|
|
| |
This introduces a flag, --eventlog-flush-interval, which can be used to
set an upper bound on the amount of time for which an eventlog event
will remain enqueued. This can be useful in real-time monitoring
settings.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using -fdicts-strict we generate references to absentError while
compiling ghc-prim. However we always load ghc-prim before base so this
caused linker errors.
We simply solve this by moving absentError into ghc-prim. This does mean
it's now a panic instead of an exception which can no longer be caught.
But given that it should only be thrown if there is a compiler error
that seems acceptable, and in fact we already do this for
absentSumFieldError which has similar constraints.
|
|
|
|
|
|
|
|
|
| |
This function is exposed in the RtsAPI.h so that external users have a
blessed way to traverse all the different `bdescr`s which are known by
the RTS.
The main motivation is to use this function in ghc-debug but avoid
having to expose the internal structure of a Capability in the API.
|
|
|
|
|
| |
Having a union in the closure profiling header really just complicates
things so get back to basics, we just have a single StgWord there for now.
|
| |
|
|
|
|
|
|
|
|
| |
Move them from the external IOInterface.h to the internal IOManager.h.
The functions are all in fact internal. They are not used from the base
library at all.
Remove ioManagerWakeup as an exported symbol. It is not used elsewhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Naming is hard. Where we want to get to is to have a clear internal and
external API for the IO manager within the RTS. What we have right now
is just the external API (used in base for the Haskell side of the
threaded IO manager impls) living in includes/rts/IOManager.h.
We want to add a clear RTS internal API, which really ought to live in
rts/IOManager.h. Several people think it's too confusing to have both:
* includes/rts/IOManager.h for the external API
* rts/IOManager.h for the internal API
So the plan is to add rts/IOManager.{h,c} as the internal parts, and
rename the external part to be includes/rts/IOInterface.h.
It is admittidly not great to have .h files in includes/rts/ called
"interface" since by definition, every .h fle under includes/ is an
interface!
Alternative naming scheme suggestions welcome!
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously we would leave the card table of new arrays uninitialized.
This wasn't a soundness issue: at worst we would end up doing
unnecessary scavenging during GC, after which the card table would be
reset. That being said, it seems worth initializing this properly to
avoid both unnecessary work and non-determinism.
Fixes #19143.
|
|
|
|
|
|
| |
used timed wait on condition variable in waitForGcThreads
fix dodgy timespec calculation
|
| |
|
|
|
|
|
|
|
|
| |
I've never observed this counter taking a non-zero value, however I do
think it's existence is justified by the comment in grab_local_todo_block.
I've not added it to RTSStats in GHC.Stats, as it doesn't seem worth the
api churn.
|
|
|
|
| |
We are no longer busyish waiting, so this is no longer meaningful
|
| |
|
|
|
|
|
|
| |
But only when profiling or DEBUG are enabled.
Fixes #17572.
|