summaryrefslogtreecommitdiff
path: root/rts/ProfHeap.c
Commit message (Collapse)AuthorAgeFilesLines
* Profiling by info table mode (-hi)Matthew Pickering2021-03-031-0/+9
| | | | | | | | This profiling mode creates bands by the address of the info table for each closure. This provides a much more fine-grained profiling output than any of the other profiling modes. The `-hi` profiling mode does not require a profiling build.
* Remove the -xt heap profiling optionMatthew Pickering2021-02-271-22/+0
| | | | | | | It should be left to tooling to perform the filtering to remove these specific closure types from the profile if desired. Fixes #16795
* rts: ProfHeap: Move definitions for Census to new headerDaniel Gröber2021-02-171-50/+16
|
* rts: ProfHeap: Merge some redundant ifdefsDaniel Gröber2021-02-171-10/+1
|
* rts: TraverseHeap: Move "flip" bit into traverseState structDaniel Gröber2021-02-171-2/+2
|
* rts: Implement heap census support for pinned objectsBen Gamari2021-01-071-29/+21
| | | | | It turns out that this was fairly straightforward to implement since we are now pretty careful about zeroing slop.
* rts: Break up census logicBen Gamari2021-01-071-176/+187
| | | | | Move the logic for taking censuses of "normal" and pinned blocks to their own functions.
* rts/ProfHeap: Free old allocations when reinitialising CensusesBen Gamari2020-07-031-0/+15
| | | | | | | | Previously when not LDV profiling we would repeatedly reinitialise `censuses[0]` with `initEra`. This failed to free the `Arena` and `HashTable` from the old census, resulting in a memory leak. Fixes #18348.
* rts/ProfHeap: Only allocate the Censuses that we needBen Gamari2020-07-031-1/+2
| | | | | | When not LDV profiling there is no reason to allocate 32 Censuses; one will do. This is a very small memory footprint optimisation, but it comes for free.
* rts: ProfHeap: Fix wrong time in last heap profile sampleDaniel Gröber2020-04-151-3/+4
| | | | | | | | | | | | | | | | | | We've had this longstanding issue in the heap profiler, where the time of the last sample in the profile is sometimes way off causing the rendered graph to be quite useless for long runs. It seems to me the problem is that we use mut_user_time() for the last sample as opposed to getRTSStats(), which we use when calling heapProfile() in GC.c. The former is equivalent to getProcessCPUTime() but the latter does some additional stuff: getProcessCPUTime() - end_init_cpu - stats.gc_cpu_ns - stats.nonmoving_gc_cpu_ns So to fix this just use getRTSStats() in both places.
* rts: Assert LDV_recordDead is not called for inherently used closuresDaniel Gröber2020-04-141-0/+2
| | | | | The comments make it clear LDV_recordDead should not be called for inhererently used closures, so add an assertion to codify this fact.
* rts: Underline some Notes as is conventionalDaniel Gröber2020-04-141-0/+1
|
* rts: Expand and add more notes regarding slopDaniel Gröber2020-04-141-2/+15
|
* rts: ProfHeap: Fix memory leak when not compiled with profilingDaniel Gröber2020-04-071-1/+1
| | | | | | | | | | | | If we're doing heap profiling on an unprofiled executable we keep allocating new space in initEra via nextEra on each profiler run but we don't have a corresponding freeEra call. We do free the last era in endHeapProfiling but previous eras will have been overwritten by initEra and will never get free()ed. Metric Decrease: space_leak_001
* rts: refactor and comment profile localesJean-Baptiste Mazon2020-03-091-15/+52
|
* rts: ensure C numerics in heap profiles using Windows locales if neededJean-Baptiste Mazon2020-03-091-19/+32
|
* Fix Windows breakage by not touching locales on WindowsJean-Baptiste Mazon2020-03-091-0/+19
|
* nonmoving-gc: Track time usage of nonmoving markingBen Gamari2020-03-051-1/+3
|
* rts: enforce POSIX numeric locale for heap profilesJean-Baptiste Mazon2020-02-291-0/+30
|
* Handle large ARR_WORDS in heap census (fix #17572)Sylvain Henry2019-12-191-0/+16
| | | | | | | | We can do a heap census with a non-profiling RTS. With a non-profiling RTS we don't zero superfluous bytes of shrunk arrays hence a need to handle the case specifically to avoid a crash. Revert part of a586b33f8e8ad60b5c5ef3501c89e9b71794bbed
* eventlog: Dump cost centre stack on each sampleMatthew Pickering2019-10-231-13/+0
| | | | | | | | | | | | | | | | With this change it is possible to reconstruct the timing portion of a `.prof` file after the fact. By logging the stacks at each time point a more precise executation trace of the program can be observed rather than all identical cost centres being identified in the report. There are two new events: 1. `EVENT_PROF_BEGIN` - emitted at the start of profiling to communicate the tick interval 2. `EVENT_PROF_SAMPLE_COST_CENTRE` - emitted on each tick to communicate the current call stack. Fixes #17322
* rts: Generalise profiling heap traversal flip bit handlingDaniel Gröber2019-09-221-2/+2
| | | | | | | This commit starts renaming some flip bit related functions for the generalised heap traversal code and adds provitions for sharing the per-closure profiling header field currently used exclusively for retainer profiling with other heap traversal profiling modes.
* eventlog: Add biographical and retainer profiling tracesMatthew Pickering2019-09-171-2/+35
| | | | | | | | | | This patch adds a new eventlog event which indicates the start of a biographical profiler sample. These are different to normal events as they also include the timestamp of when the census took place. This is because the LDV profiler only emits samples at the end of the run. Now all the different profiling modes emit consumable events to the eventlog.
* rts: Always truncate output filesBen Gamari2019-08-021-1/+1
| | | | | | | | | Previously there were numerous places in the RTS where we would fopen with the "w" flag string. This is wrong as it will not truncate the file. Consequently if we write less data than the previous length of the file we will leave garbage at its end. Fixes #16993.
* rts: Divorce init of Heap profiler from CCS profilerDaniel Gröber2019-07-161-55/+41
| | | | | | | | | Currently initProfiling gets defined by Profiling.c only if PROFILING is defined. Otherwise the ProfHeap.c defines it. This is just needlessly complicated so in this commit I make Profiling and ProfHeap into properly seperate modules and call their respective init functions from RtsStartup.c.
* rts: Fix -hT option with profiling rtsDaniel Gröber2019-07-041-4/+1
| | | | | | | | | | | In dumpCensus we switch/case on doHeapProfile twice. The second switch tries to barf on unknown doHeapProfile modes but HEAP_BY_CLOSURE_TYPE is checked by the first switch and not included in the second. So when trying to pass -hT to the profiling rts it barfs. This commit simply merges the two switches into one which fixes this problem.
* rts: Correct assertion in LDV_recordDeadMatthew Pickering2019-06-271-1/+1
| | | | | It is possible that void_total is exactly equal to not_used and the other assertions for this check for <= rather than <.
* rts: Correct handling of LARGE ARR_WORDS in LDV profilerMatthew Pickering2019-06-271-13/+2
| | | | | | | | | This implements the correct fix for #11627 by skipping over the slop (which is zeroed) rather than adding special case logic for LARGE ARR_WORDS which runs the risk of not performing a correct census by ignoring any subsequent blocks. This approach implements similar logic to that in Sanity.c
* Add HEAP_PROF_SAMPLE_END event to mark end of samplesMatthew Pickering2019-06-071-0/+1
| | | | | | | This allows a user to observe how long a sampling period lasts so that the time taken can be removed from the profiling output. Fixes #16697
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-151-1/+1
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* rts: Turn ASSERT in LDV_recordDead into a normal ifBen Gamari2018-12-251-1/+3
| | | | | | | | As reported in #15382 the `ASSERT(ctr != NULL)` is currently getting routinely hit during testsuite runs. While this is certainly a bug I would far prefer getting a proper error message than a segmentation fault. Consequently I'm turning the `ASSERT` into a proper `if` so we get a proper error in non-debug builds.
* Fix uninformative hp2ps error when the cmdline contains double quotesZejun Wu2018-12-111-8/+22
| | | | | | | | | | | | | | | | | | | | | | | Reapply D5346 with fix incompatible shell quoting in tests. It seems like `$'string'` is not recognized under all test environments, so let's avoid it in tests. Test Plan: ``` hp2ps: "T15904".hp, line 2: integer must follow identifier ``` use new ghc and hp2ps to profile a simple program. Reviewers: simonmar, bgamari, erikd, tdammers Reviewed By: bgamari Subscribers: tdammers, carter, rwbarton GHC Trac Issues: #15904 Differential Revision: https://phabricator.haskell.org/D5388
* Revert "Fix uninformative hp2ps error when the cmdline contains double quotes"Ben Gamari2018-11-241-22/+8
| | | | This reverts commit 390df8b51b917fb6409cbde8e73fe838d61d8832.
* Fix uninformative hp2ps error when the cmdline contains double quotesZejun Wu2018-11-221-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The format of hp file didn't allow double quotes inside strings, and under prof build, we include args in JOB, which may have double quotes. When this happens, the error message is confusing to the user. This can also happen under normal build if the executable name contains double quite, which is unlikely though. We fix this issue by introducing escaping for double quotes inside a string by repeating it twice. We also fix a buffer overflow bug when the length of the string happen to be multiple of 5000. Test Plan: new tests, which used to fail with error message: ``` hp2ps: "T15904".hp, line 2: integer must follow identifier ``` use new ghc and hp2ps to profile a simple program. Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15904 Differential Revision: https://phabricator.haskell.org/D5346
* Rename some mutable closure types for consistencyÖmer Sinan Ağacan2018-06-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR, MUT_ARR_PTRS). (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR) Removed a few comments in Scav.c about FROZEN0 being on the mut_list because it's now clear from the closure type. Reviewers: bgamari, simonmar, erikd Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4784
* rts: Allow profiling by closure type in prof wayBen Gamari2018-05-011-4/+1
| | | | | | | | | | | Previously we inexplicably disabled support for `-hT` profiling in the profiled way. Admittedly, there are relatively few cases where one would prefer -hT to `-hd`, but the option should nevertheless be available for the sake of consistency. Note that this does mean that there is a bit of an inconsistency in the behavior of `-h`: in the profiled way `-h` behaves like `-hc` whereas in the non-profiled way it defaults to `-hT`.
* Remove MAX_PATH restrictions from RTS, I/O manager and various utilitiesTamar Christina2018-03-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This shims out fopen and sopen so that they use modern APIs under the hood along with namespaced paths. This lifts the MAX_PATH restrictions from Haskell programs and makes the new limit ~32k. There are only some slight caveats that have been documented. Some utilities have not been upgraded such as lndir, since all these things are different cabal packages I have been forced to copy the source in different places which is less than ideal. But it's the only way to keep sdist working. Test Plan: ./validate Reviewers: hvr, bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter GHC Trac Issues: #10822 Differential Revision: https://phabricator.haskell.org/D4416
* Fix note references and some typosGabor Greif2017-07-261-1/+1
|
* Prefer #if defined to #ifdefBen Gamari2017-04-281-32/+32
| | | | Our new CPP linter enforces this.
* Typecast covers entire expression to fix format warning.bollu2017-02-141-10/+15
| | | | | | | | | | | | | | - Fixes (#12636). - changes all the typecasts to _unsinged long long_ to have the format specifiers work. Reviewers: austin, bgamari, erikd, simonmar, Phyx Reviewed By: erikd, Phyx Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3129
* Use C99's boolBen Gamari2016-11-291-38/+38
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Remove CONSTR_STATICSimon Marlow2016-11-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We currently have two info tables for a constructor * XXX_con_info: the info table for a heap-resident instance of the constructor, It has type CONSTR, or one of the specialised types like CONSTR_1_0 * XXX_static_info: the info table for a static instance of this constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF. I'm getting rid of the latter, and using the `con_info` info table for both static and dynamic constructors. For rationale and more details see Note [static constructors] in SMRep.hs. I also removed these macros: `isSTATIC()`, `ip_STATIC()`, `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC distinction, and anyway HEAP_ALLOCED() does the same job. Test Plan: validate Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2690 GHC Trac Issues: #12455
* rts: Disable -hb with multiple capabilitiesBen Gamari2016-09-121-0/+7
| | | | | | | | | | | | | | | | | Biographical profiling is not thread-safe as documented in #12019. Throw an error when it is used in this way. Test Plan: Validate Reviewers: simonmar, austin, erikd Reviewed By: erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2516 GHC Trac Issues: #12019
* Compact RegionsGiovanni Campagna2016-07-201-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | This brings in initial support for compact regions, as described in the ICFP 2015 paper "Efficient Communication and Collection with Compact Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni Campagna. Some things may change before the 8.2 release, but I (Simon M.) wanted to get the main patch committed so that we can iterate. What documentation there is is in the Data.Compact module in the new compact package. We'll need to extend and polish the documentation before the release. Test Plan: validate (new test cases included) Reviewers: ezyang, simonmar, hvr, bgamari, austin Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd Differential Revision: https://phabricator.haskell.org/D1264 GHC Trac Issues: #11493
* Log heap profiler samples to event logBen Gamari2016-07-161-3/+29
| | | | | | | | | | | | Test Plan: Try it Reviewers: hvr, simonmar, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1722 GHC Trac Issues: #11094
* NUMA supportSimon Marlow2016-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: More const correct-ness fixesErik de Castro Lopo2016-05-181-9/+9
| | | | | | | | | | | | | | | | | | | | In addition to more const-correctness fixes this patch fixes an infelicity of the previous const-correctness patch (995cf0f356) which left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter but returning a non-const pointer. Here we restore the original type signature of `UNTAG_CLOSURE` and add a new function `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure` pointer and uses that wherever possible. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi Reviewed By: simonmar, trofi Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2231
* rts: Make function pointer parameters `const` where possibleErik de Castro Lopo2016-05-121-4/+4
| | | | | | | | | | | | | | | | If a function takes a pointer parameter and doesn't update what the pointer points to, we can add `const` to the parameter declaration to document that no updates occur. Test Plan: Validate on Linux, OS X and Windows Reviewers: austin, Phyx, bgamari, simonmar, hsyl20 Reviewed By: bgamari, simonmar, hsyl20 Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2200
* Use stdint types for Stg{Word,Int}{8,16,32,64}Tomas Carnecky2016-05-101-1/+1
| | | | | | | | | | | | | | | | | | We can't define Stg{Int,Word} in terms of {,u}intptr_t because STG depends on them being the exact same size as void*, and {,u}intptr_t does not make that guarantee. Furthermore, we also need to define StgHalf{Int,Word}, so the preprocessor if needs to stay. But we can at least keep it in a single place instead of repeating it in various files. Also define STG_{INT,WORD}{8,16,32,64}_{MIN,MAX} and use it in HsFFI.h, further reducing the need for CPP in other files. Reviewers: austin, bgamari, simonmar, hvr, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2182
* rts/ProfHeap.c: Use `ssize_t` instead of `long`.Erik de Castro Lopo2016-05-081-21/+22
| | | | | | | | | | | | | | | On x64 Windows `sizeof long` is 4 which could easily overflow resulting in incorrect heap profiling results. This change does not affect either Linux or OS X where `sizeof long` == `sizeof ssize_t` regardless of machine word size. Test Plan: Validate on Linux and Windows Reviewers: hsyl20, bgamari, simonmar, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2177