| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
|
|
|
|
| |
It turns out that this was fairly straightforward to implement since we
are now pretty careful about zeroing slop.
|
|
|
|
|
| |
Move the logic for taking censuses of "normal" and pinned blocks to
their own functions.
|
|
|
|
|
|
|
|
| |
Previously when not LDV profiling we would repeatedly reinitialise
`censuses[0]` with `initEra`. This failed to free the `Arena` and
`HashTable` from the old census, resulting in a memory leak.
Fixes #18348.
|
|
|
|
|
|
| |
When not LDV profiling there is no reason to allocate 32 Censuses; one
will do. This is a very small memory footprint optimisation, but it
comes for free.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had this longstanding issue in the heap profiler, where the time of
the last sample in the profile is sometimes way off causing the rendered
graph to be quite useless for long runs.
It seems to me the problem is that we use mut_user_time() for the last
sample as opposed to getRTSStats(), which we use when calling heapProfile()
in GC.c.
The former is equivalent to getProcessCPUTime() but the latter does
some additional stuff:
getProcessCPUTime() - end_init_cpu - stats.gc_cpu_ns -
stats.nonmoving_gc_cpu_ns
So to fix this just use getRTSStats() in both places.
|
|
|
|
|
| |
The comments make it clear LDV_recordDead should not be called for
inhererently used closures, so add an assertion to codify this fact.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we're doing heap profiling on an unprofiled executable we keep
allocating new space in initEra via nextEra on each profiler run but we
don't have a corresponding freeEra call.
We do free the last era in endHeapProfiling but previous eras will have
been overwritten by initEra and will never get free()ed.
Metric Decrease:
space_leak_001
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
We can do a heap census with a non-profiling RTS. With a non-profiling
RTS we don't zero superfluous bytes of shrunk arrays hence a need to
handle the case specifically to avoid a crash.
Revert part of a586b33f8e8ad60b5c5ef3501c89e9b71794bbed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change it is possible to reconstruct the timing portion of a
`.prof` file after the fact. By logging the stacks at each time point
a more precise executation trace of the program can be observed rather
than all identical cost centres being identified in the report.
There are two new events:
1. `EVENT_PROF_BEGIN` - emitted at the start of profiling to communicate
the tick interval
2. `EVENT_PROF_SAMPLE_COST_CENTRE` - emitted on each tick to communicate the
current call stack.
Fixes #17322
|
|
|
|
|
|
|
| |
This commit starts renaming some flip bit related functions for the
generalised heap traversal code and adds provitions for sharing the
per-closure profiling header field currently used exclusively for retainer
profiling with other heap traversal profiling modes.
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new eventlog event which indicates the start of
a biographical profiler sample. These are different to normal events as
they also include the timestamp of when the census took place. This is
because the LDV profiler only emits samples at the end of the run.
Now all the different profiling modes emit consumable events to the
eventlog.
|
|
|
|
|
|
|
|
|
| |
Previously there were numerous places in the RTS where we would fopen
with the "w" flag string. This is wrong as it will not truncate the
file. Consequently if we write less data than the previous length of the
file we will leave garbage at its end.
Fixes #16993.
|
|
|
|
|
|
|
|
|
| |
Currently initProfiling gets defined by Profiling.c only if PROFILING is
defined. Otherwise the ProfHeap.c defines it.
This is just needlessly complicated so in this commit I make Profiling and
ProfHeap into properly seperate modules and call their respective init
functions from RtsStartup.c.
|
|
|
|
|
|
|
|
|
|
|
| |
In dumpCensus we switch/case on doHeapProfile twice. The second switch
tries to barf on unknown doHeapProfile modes but HEAP_BY_CLOSURE_TYPE is
checked by the first switch and not included in the second.
So when trying to pass -hT to the profiling rts it barfs.
This commit simply merges the two switches into one which fixes this
problem.
|
|
|
|
|
| |
It is possible that void_total is exactly equal to not_used and the
other assertions for this check for <= rather than <.
|
|
|
|
|
|
|
|
|
| |
This implements the correct fix for #11627 by skipping over the slop
(which is zeroed) rather than adding special case logic for LARGE
ARR_WORDS which runs the risk of not performing a correct census by
ignoring any subsequent blocks.
This approach implements similar logic to that in Sanity.c
|
|
|
|
|
|
|
| |
This allows a user to observe how long a sampling period lasts so that
the time taken can be removed from the profiling output.
Fixes #16697
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
|
|
|
| |
As reported in #15382 the `ASSERT(ctr != NULL)` is currently getting routinely
hit during testsuite runs. While this is certainly a bug I would far prefer
getting a proper error message than a segmentation fault. Consequently I'm
turning the `ASSERT` into a proper `if` so we get a proper error in non-debug
builds.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reapply D5346 with fix incompatible shell quoting in tests. It seems
like `$'string'` is not recognized under all test environments, so let's
avoid it in tests.
Test Plan:
```
hp2ps: "T15904".hp, line 2: integer must follow identifier
```
use new ghc and hp2ps to profile a simple program.
Reviewers: simonmar, bgamari, erikd, tdammers
Reviewed By: bgamari
Subscribers: tdammers, carter, rwbarton
GHC Trac Issues: #15904
Differential Revision: https://phabricator.haskell.org/D5388
|
|
|
|
| |
This reverts commit 390df8b51b917fb6409cbde8e73fe838d61d8832.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The format of hp file didn't allow double quotes inside strings, and
under prof build, we include args in JOB, which may have double quotes.
When this happens, the error message is confusing to the user. This can
also happen under normal build if the executable name contains double
quite, which is unlikely though.
We fix this issue by introducing escaping for double quotes inside a
string by repeating it twice.
We also fix a buffer overflow bug when the length of the string happen
to be multiple of 5000.
Test Plan:
new tests, which used to fail with error message:
```
hp2ps: "T15904".hp, line 2: integer must follow identifier
```
use new ghc and hp2ps to profile a simple program.
Reviewers: simonmar, bgamari, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
GHC Trac Issues: #15904
Differential Revision: https://phabricator.haskell.org/D5346
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY
MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN
Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
MUT_ARR_PTRS).
(alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
Removed a few comments in Scav.c about FROZEN0 being on the mut_list
because it's now clear from the closure type.
Reviewers: bgamari, simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4784
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we inexplicably disabled support for `-hT` profiling in the profiled
way. Admittedly, there are relatively few cases where one would prefer -hT to
`-hd`, but the option should nevertheless be available for the sake of
consistency.
Note that this does mean that there is a bit of an inconsistency in the behavior
of `-h`: in the profiled way `-h` behaves like `-hc` whereas in the non-profiled
way it defaults to `-hT`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This shims out fopen and sopen so that they use modern APIs under the hood
along with namespaced paths.
This lifts the MAX_PATH restrictions from Haskell programs and makes the new
limit ~32k.
There are only some slight caveats that have been documented.
Some utilities have not been upgraded such as lndir, since all these things are
different cabal packages I have been forced to copy the source in different places
which is less than ideal. But it's the only way to keep sdist working.
Test Plan: ./validate
Reviewers: hvr, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #10822
Differential Revision: https://phabricator.haskell.org/D4416
|
| |
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fixes (#12636).
- changes all the typecasts to _unsinged long long_ to
have the format specifiers work.
Reviewers: austin, bgamari, erikd, simonmar, Phyx
Reviewed By: erikd, Phyx
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3129
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We currently have two info tables for a constructor
* XXX_con_info: the info table for a heap-resident instance of the
constructor, It has type CONSTR, or one of the specialised types like
CONSTR_1_0
* XXX_static_info: the info table for a static instance of this
constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
I'm getting rid of the latter, and using the `con_info` info table for
both static and dynamic constructors. For rationale and more details
see Note [static constructors] in SMRep.hs.
I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
`closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
distinction, and anyway HEAP_ALLOCED() does the same job.
Test Plan: validate
Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2690
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Biographical profiling is not thread-safe as documented in #12019. Throw
an error when it is used in this way.
Test Plan: Validate
Reviewers: simonmar, austin, erikd
Reviewed By: erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2516
GHC Trac Issues: #12019
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings in initial support for compact regions, as described in the
ICFP 2015 paper "Efficient Communication and Collection with Compact
Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
Campagna.
Some things may change before the 8.2 release, but I (Simon M.) wanted
to get the main patch committed so that we can iterate.
What documentation there is is in the Data.Compact module in the new
compact package. We'll need to extend and polish the documentation
before the release.
Test Plan:
validate
(new test cases included)
Reviewers: ezyang, simonmar, hvr, bgamari, austin
Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
Differential Revision: https://phabricator.haskell.org/D1264
GHC Trac Issues: #11493
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Try it
Reviewers: hvr, simonmar, austin, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1722
GHC Trac Issues: #11094
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to more const-correctness fixes this patch fixes an
infelicity of the previous const-correctness patch (995cf0f356) which
left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
but returning a non-const pointer. Here we restore the original type
signature of `UNTAG_CLOSURE` and add a new function
`UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
pointer and uses that wherever possible.
Test Plan: Validate on Linux, OS X and Windows
Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
Reviewed By: simonmar, trofi
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2231
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a function takes a pointer parameter and doesn't update what
the pointer points to, we can add `const` to the parameter
declaration to document that no updates occur.
Test Plan: Validate on Linux, OS X and Windows
Reviewers: austin, Phyx, bgamari, simonmar, hsyl20
Reviewed By: bgamari, simonmar, hsyl20
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't define Stg{Int,Word} in terms of {,u}intptr_t because STG
depends on them being the exact same size as void*, and {,u}intptr_t
does not make that guarantee. Furthermore, we also need to define
StgHalf{Int,Word}, so the preprocessor if needs to stay. But we can at
least keep it in a single place instead of repeating it in various
files.
Also define STG_{INT,WORD}{8,16,32,64}_{MIN,MAX} and use it in HsFFI.h,
further reducing the need for CPP in other files.
Reviewers: austin, bgamari, simonmar, hvr, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2182
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On x64 Windows `sizeof long` is 4 which could easily overflow
resulting in incorrect heap profiling results. This change does not
affect either Linux or OS X where `sizeof long` == `sizeof ssize_t`
regardless of machine word size.
Test Plan: Validate on Linux and Windows
Reviewers: hsyl20, bgamari, simonmar, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2177
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The heap census now handles large ARR_WORDS objects which have
been shrunk by shrinkMutableByteArray# or resizeMutableByteArray#.
Test Plan: ./validate && make test WAY=profasm
Reviewers: hvr, bgamari, austin, thomie
Reviewed By: thomie
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2005
GHC Trac Issues: #11627
|