| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
In ticket #18733 we noticed a rather serious deficiency in the current
fingerprinting logic for recursive groups. I have described the old
fingerprinting story and its problems in Note [Fingerprinting recursive
groups] and have reworked the story accordingly to avoid these issues.
Fixes #18733.
|
| |
|
| |
|
|
|
|
|
| |
This uses the highMemDynamic flag introduced earlier to verify that
dynamic objects are properly unloaded.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(This change is originally written by niteria)
This adds two functions:
* `loadNativeObj`
* `unloadNativeObj`
and implements them for Linux.
They are useful if you want to load a shared object with Haskell code
using the system linker and have GHC call dlclose() after the
code is no longer referenced from the heap.
Using the system linker allows you to load the shared object
above outside the low-mem region. It also loads the DWARF sections
in a way that `perf` understands.
`dl_iterate_phdr` is what makes this implementation Linux specific.
|
|
|
|
|
|
|
| |
Fixes #16525 by tracking dependencies between object file symbols and
marking symbol liveness during garbage collection
See Note [Object unloading] in CheckUnload.c for details.
|
|
|
|
|
| |
This seems like a reasonable default as the object file size increases
by around 5%.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that some important native debugging/profiling tools (e.g.
perf) rely only on symbol tables for function name resolution (as
opposed to using DWARF DIEs). However, previously GHC would emit
temporary symbols (e.g. `.La42b`) to identify module-internal
entities. Such symbols are dropped during linking and therefore not
visible to runtime tools (in addition to having rather un-helpful unique
names). For instance, `perf report` would often end up attributing all
cost to the libc `frame_dummy` symbol since Haskell code was no covered
by any proper symbol (see #17605).
We now rather follow the model of C compilers and emit
descriptively-named local symbols for module internal things. Since this
will increase object file size this behavior can be disabled with the
`-fno-expose-internal-symbols` flag.
With this `perf record` can finally be used against Haskell executables.
Even more, with `-g3` `perf annotate` provides inline source code.
|
|
|
|
|
|
| |
In various places in the NCG we need the Module currently being
compiled. Let's move this into the environment instead of chewing threw
another register.
|
|
|
|
|
| |
It appears this was an oversight as there is no reason the full DynFlags
is necessary.
|
| |
|
|
|
|
|
| |
This got fixed sometime recently; not worth it trying to
figure out which commit.
|
|
|
|
|
|
| |
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the overflow check for the IMAGE_REL_AMD64_ADDR32NB
relocation failed to account for the signed nature of the value.
Specifically, the overflow check was:
uint64_t v;
v = S + A;
if (v >> 32) { ... }
However, `v` ultimately needs to fit into 32-bits as a signed value.
Consequently, values `v > 2^31` in fact overflow yet this is not caught
by the existing overflow check.
Here we rewrite the overflow check to rather ensure that
`INT32_MIN <= v <= INT32_MAX`. There is now quite a bit of repetition
between the `IMAGE_REL_AMD64_REL32` and `IMAGE_REL_AMD64_ADDR32` cases
but I am leaving fixing this for future work.
This bug was first noticed by @awson.
Fixes #15808.
|
|
|
|
| |
The previous merge mistakenly reverted it.
|
|\ |
|
| |\ |
|
| | |
| | |
| | |
| | | |
Since the latter wants to call getRTSStats.
|
| | |
| | |
| | |
| | |
| | | |
While on face value this seems a bit heavy, I think it's far better than
enforcing ordering on every access.
|
| | | |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | | |
We can generally be pretty relaxed in the barriers here since the timer
thread is a loop.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Previously `initScheduler` would attempt to pause the ticker and in so
doing acquire the ticker mutex. However, initTicker, which is
responsible for initializing said mutex, hadn't been called
yet.
|
| | | | |
|
| | | | |
|
| | | |
| | | |
| | | |
| | | | |
This avoids #17289.
|
| | |/ |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | | |
This suppresses the other side of a race during shutdown.
|
| | |/ |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Previously the `current_value`, `first_watch_queue_entry`, and
`num_updates` fields of `StgTVar` were marked as `volatile` in an
attempt to provide strong ordering. Of course, this isn't sufficient.
We now use proper atomic operations. In most of these cases I strengthen
the ordering all the way to SEQ_CST although it's possible that some
could be weakened with some thought.
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes a potentially harmful race where we failed to synchronize
before looking at a TVar's current_value.
Also did a bit of refactoring to avoid abstract over management of
max_commits.
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Here we are doing lazy initialization; it's okay if we do the check more
than once, hence relaxed operation is fine.
|
| | | |
| | | |
| | | |
| | | | |
Fixes #17275.
|
| | |/ |
|
| |\ \ |
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After a few attempts at shoring up the previous implementation, I ended
up turning to the literature and now use the proven implementation,
> N.M. Lê, A. Pop, A.Cohen, and F.Z. Nardelli. "Correct and Efficient
> Work-Stealing for Weak Memory Models". PPoPP'13, February 2013,
> ACM 978-1-4503-1922/13/02.
Note only is this approach formally proven correct under C11 semantics
but it is also proved to be a bit faster in practice.
|
| |\ \ |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Not only is this in general a good idea, but it turns out that GCC
unrolls the retry loop, resulting is massive code bloat in critical
parts of the RTS (e.g. `evacuate`).
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Ensure that the GC leader synchronizes with workers before calling
stat_endGC.
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Previously we would take all capabilities but fail to join on the thread
itself, potentially resulting in a leaked thread.
|
| | | | |
|
| | | | |
|