| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously it was unclear whether req_shared_libs should require:
* that the platform supports dynamic library loading,
* that GHC supports dynamic linking of Haskell code, or
* that the dyn way libraries were built
Clarify by splitting the predicate into two:
* `req_dynamic_lib_support` demands that the platform support dynamic
linking
* `req_dynamic_hs` demands that the GHC support dynamic linking of
Haskell code on the target platform
Naturally `req_dynamic_hs` cannot be true unless
`req_dynamic_lib_support` is also true.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Emit an Info Table Provenance Entry (IPE) for every stack represeted info table
if -finfo-table-map is turned on.
To decode a cloned stack, lookupIPE() is used. It provides a mapping between
info tables and their source location.
Please see these notes for details:
- [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)]
- [Mapping Info Tables to Source Positions]
Metric Increase:
T12545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack).
The cloning interface consists of two functions, that clone either the treads
own stack (cloneMyStack) or another threads stack (cloneThreadStack).
The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is
useful for analyses as it prevents concurrent modifications.
For technical details, please see Note [Stack Cloning].
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some obscure situations where the RHS of a rule can contain a
tick which is not mentioned anywhere else in the program. If this
happens you end up with an obscure linker error. The solution is quite
simple, traverse the RHS of rules to also look for ticks. It turned out
to be easier to implement if the traversal was moved into CoreTidy
rather than at the start of code generation because there we still had
easy access to the rules.
./StreamD.o(.text+0x1b9f2): error: undefined reference to 'StreamK_mkStreamFromStream_HPC_cc'
./MArray.o(.text+0xbe83): error: undefined reference to 'StreamK_mkStreamFromStream_HPC_cc'
Main.o(.text+0x6fdb): error: undefined reference to 'StreamK_mkStreamFromStream_HPC_cc'
|
| |
|
|
|
|
|
|
|
| |
As noted in #10037, the `ioprof` test would change its stderr output
(specifically the stacktrace produced by `error`) depending upon
optimisation level. As the `error` backtrace is not the point of this
test, we now ignore the `stderr` output.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch exposes three new functions in `GHC.Profiling` which allow
heap profiling to be enabled and disabled dynamically.
1. startHeapProfTimer - Starts heap profiling with the given RTS options
2. stopHeapProfTimer - Stops heap profiling
3. requestHeapCensus - Perform a heap census on the next context
switch, regardless of whether the timer is enabled or not.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first change makes the array ones use the proper fixed-size types,
which also means that just like before, they can be used without
explicit conversions with the boxed sized types. (Before, it was Int# /
Word# on both sides, now it is fixed sized on both sides).
For the second change, don't use "extend" or "narrow" in some of the
user-facing primops names for conversions.
- Names like `narrowInt32#` are misleading when `Int` is 32-bits.
- Names like `extendInt64#` are flat-out wrong when `Int is
32-bits.
- `narrow{Int,Word}<N>#` however map a type to itself, and so don't
suffer from this problem. They are left as-is.
These changes are batched together because Alex happend to use the array
ops. We can only use released versions of Alex at this time, sadly, and
I don't want to have to have a release thatwon't work for the final GHC
9.2. So by combining these we get all the changes for Alex done at once.
Bump hackage state in a few places, and also make that workflow slightly
easier for the future.
Bump minimum Alex version
Bump Cabal, array, bytestring, containers, text, and binary submodules
|
|
|
|
|
| |
For now this just tests that the order of the callbacks is what we expect
for a couple of synthetic heap graphs.
|
|
|
|
|
|
|
|
|
|
|
| |
This gives a small increase in performance under most circumstances.
For single threaded GC the improvement is on the order of 1-2%.
For multi threaded GC the results are quite noisy but seem to
fall into the same ballpark.
Fixes #16499
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces all Word<N> = W<N># Word# and Int<N> = I<N># Int# with
Word<N> = W<N># Word<N># and Int<N> = I<N># Int<N>#, thus providing us
with properly sized primitives in the codegenerator instead of pretending
they are all full machine words.
This came up when implementing darwinpcs for arm64. The darwinpcs reqires
us to pack function argugments in excess of registers on the stack. While
most procedure call standards (pcs) assume arguments are just passed in
8 byte slots; and thus the caller does not know the exact signature to make
the call, darwinpcs requires us to adhere to the prototype, and thus have
the correct sizes. If we specify CInt in the FFI call, it should correspond
to the C int, and not just be Word sized, when it's only half the size.
This does change the expected output of T16402 but the new result is no
less correct as it eliminates the narrowing (instead of the `and` as was
previously done).
Bumps the array, bytestring, text, and binary submodules.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
Metric Increase:
T13701
T14697
|
|
|
|
|
|
|
|
| |
This introducing a new compiler flag to provide a convenient way to
introduce profiler cost-centers on all occurrences of the named
identifier.
Closes #18566.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As #18355 shows, we were failing to preserve one-shot info when
eta-expanding. It's rather easy to fix, by using ArityType more,
rather than just Arity.
This patch is important to suport the one-shot monad trick;
see #18202. But the extra tracking of one-shot-ness requires
the patch
Define multiShotIO and use it in mkSplitUniqueSupply
If that patch is missing, ths patch makes things worse in
GHC.Types.Uniq.Supply. With it, however, we see these improvements
T3064 compiler bytes allocated -2.2%
T3294 compiler bytes allocated -1.3%
T12707 compiler bytes allocated -1.3%
T13056 compiler bytes allocated -2.2%
Metric Decrease:
T3064
T3294
T12707
T13056
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* support detection of slow ghc-bignum backend (to replace the detection
of integer-simple use). There are still some test cases that the
native backend doesn't handle efficiently enough.
* remove tests for GMP only functions that have been removed from
ghc-bignum
* fix test results showing dependent packages (e.g. integer-gmp) or
showing suggested instances
* fix test using Integer/Natural API or showing internal names
|
|
|
|
|
|
|
| |
Currently, the names of cost centres must be quoted or
be lowercase identifiers.
Fixes #17916.
|
| |
|
|
|
|
|
|
|
|
| |
We can do a heap census with a non-profiling RTS. With a non-profiling
RTS we don't zero superfluous bytes of shrunk arrays hence a need to
handle the case specifically to avoid a crash.
Revert part of a586b33f8e8ad60b5c5ef3501c89e9b71794bbed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements GHC Proposal #176:
https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0176-scc-parsing.rst
Before the change:
1 / 2 / 2 = 0.25
1 / {-# SCC "name" #-} 2 / 2 = 1.0
After the change:
1 / 2 / 2 = 0.25
1 / {-# SCC "name" #-} 2 / 2 = parse error
|
|
|
|
|
| |
It was previously marked as broken due to #12236 however it passes for
me locally while failing on CI.
|
|
|
|
| |
This test uses -dynamic-too, which is not supported on Windows.
|
|
|
|
|
|
| |
As per https://prime.haskell.org/wiki/Libraries/Proposals/MonadFail
Coauthored-by: Ben Gamari <ben@well-typed.com>
|
|
|
|
| |
See #15382.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trac #10069 revealed that small NOINLINE functions didn't get split
into worker and wrapper. This was due to `certainlyWillInline`
saying that any unfoldings with a guidance of `UnfWhen` inline
unconditionally. That isn't the case for NOINLINE functions, so we
catch this case earlier now.
Nofib results:
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
fannkuch-redux -0.3% 0.0%
gg +0.0% +0.1%
maillist -0.2% -0.2%
minimax 0.0% -0.8%
--------------------------------------------------------------------------------
Min -0.3% -0.8%
Max +0.0% +0.1%
Geometric Mean -0.0% -0.0%
Fixes #10069.
-------------------------
Metric Increase:
T9233
-------------------------
|
| |
|
| |
|
|
|
|
| |
See Trac #15072, Trac #16349, Trac #16350
|
|
|
|
| |
See #16193.
|
| |
|
|
|
|
|
|
|
| |
As noted in #16227 this test routinely times out when run in the
unregisterised way.
See also #15467.
|
|
|
|
|
| |
This eliminates most uses of run_command in the testsuite in favor of the more
structured makefile_test.
|
|
|
|
| |
This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because garbage collector calls `retainerProfile()` and `heapCensus()`,
GC times normally include some of PROF times too. To fix this we have
these lines:
// heapCensus() is called by the GC, so RP and HC time are
// included in the GC stats. We therefore subtract them to
// obtain the actual GC cpu time.
stats.gc_cpu_ns -= prof_cpu;
stats.gc_elapsed_ns -= prof_elapsed;
These variables are later used for calculating GC time excluding the
final GC (which should be attributed to EXIT).
exit_gc_elapsed = stats.gc_elapsed_ns - start_exit_gc_elapsed;
The problem is if we subtract PROF times from `gc_elapsed_ns` and then
subtract `start_exit_gc_elapsed` from the result, we end up subtracting
PROF times twice, because `start_exit_gc_elapsed` also includes PROF
times.
We now subtract PROF times from GC after the calculations for EXIT and
MUT times. The existing assertion that checks
INIT + MUT + GC + EXIT = TOTAL
now holds. When we subtract PROF numbers from GC, and a new assertion
INIT + MUT + GC + PROF + EXIT = TOTAL
also holds.
Fixes #15897. New assertions added in this commit also revealed #16102,
which is also fixed by this commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Mark arith011 as broken with integer-simple
As noted in #16091, arith011 fails when run against integer-simple with a
"divide by zero" exception. This suggests that integer-gmp and integer-simple
are handling division by zero differently.
* This also fixes broken_without_gmp; the lack of types made the previous
failure silent, sadly. Improves situation of #16043.
* Mark several tests implicitly depending upon integer-gmp as broken
with integer-simple. These expect to see Core coming from integer-gmp,
which breaks with integer-simple.
* Increase runtime timeout multiplier of T11627a with integer-simple
I previously saw that T11627a timed out in all profiling ways when run against
integer-simple. I suspect this is due to integer-simple's rather verbose heap
representation. Let's see whether increasing the runtime timeout helps.
Fixes test for #11627.
This is all in service of fixing #16043.
|
|
|
|
|
|
| |
The retainer profiler no longer uses the C stack for its mark stack (#14758).
Consequently even the small C stack provided on Darwin should be sufficient to
run this test. See #11627
|
|
|
|
|
|
| |
As documented in #15382, this is known to fail in prof_hc_hb on i386.
Concerningly, I have also seen this test non-deterministically fail in
prof_hc_hb on amd64. We should really investigate this.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CONSTR_NOCAF was introduced with 55d535da10d as a replacement for
CONSTR_STATIC and CONSTR_NOCAF_STATIC, however, as explained in Note
[static constructors], we copy CONSTR_NOCAFs (which can also be seen in
evacuate) during GC, and they can become dead, like other CONSTR_X_Ys.
processHeapClosureForDead is updated to reflect this.
Test Plan: Validates on x86_64. Existing failures on i386.
Reviewers: simonmar, bgamari, erikd
Reviewed By: simonmar, bgamari
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #7836, #15063, #15087, #15165
Differential Revision: https://phabricator.haskell.org/D4928
|
| |
|
|
|
|
|
|
|
| |
Darwin tends to give us a very small stack which the retainer profiler tends to
overflow. Strangely, this manifested on CircleCI yet not Harbormaster.
See #15287 and #11627.
|
|
|
|
| |
Due to #15063.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This finally gets us to a green ./validate --slow on linux for a ghc
checkout from the beginning of this week, see
https://circleci.com/gh/ghc/ghc/4739
This is hopefully the final (or second to final) patch to
address #14890.
Test Plan: ./validate --slow
Reviewers: bgamari, hvr, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #14890
Differential Revision: https://phabricator.haskell.org/D4712
|
|
|
|
|
|
|
|
| |
Trac #15108 showed that the simple optimiser in CoreOpt
was accidentally eta-reducing a join point, so it didn't meet
its arity invariant.
This patch fixes it. See Note [Preserve join-binding arity].
|
|
|
|
| |
thanks to cdisselkoen for the nicely minimized test case.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: alpmestan
Reviewed By: alpmestan
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #14931
Differential Revision: https://phabricator.haskell.org/D4518
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was broken by D3746 and/or D3809, but unfortunately we didn't
notice because CI at the time wasn't building the profiling way.
Test Plan:
```
cd testsuite/test/profiling/should_run
make WAY=ghci-ext-prof
```
Reviewers: bgamari, michalt, hvr, erikd
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #14705
Differential Revision: https://phabricator.haskell.org/D4437
|