summaryrefslogtreecommitdiff
path: root/rts/Profiling.c
Commit message (Collapse)AuthorAgeFilesLines
* Make `PosixSource.h` installed and under `rts/`John Ericson2021-08-091-1/+1
| | | | | | is used outside of the rts so we do this rather than just fish it out of the repo in ad-hoc way, in order to make packages in this repo more self-contained.
* eventlog: Repost initialisation events when eventlog restartsMatthew Pickering2021-03-081-1/+1
| | | | | | | | | | | | | | | | | | If startEventlog is called after the program has already started running then quite a few useful events are missing from the eventlog because they are only posted when the program starts. This patch adds a mechanism to declare that an event should be reposted everytime the startEventlog function is called. Now in EventLog.c there is a global list of functions called `eventlog_header_funcs` which stores a list of functions which should be called everytime the eventlog starts. When calling `postInitEvent`, the event will not only be immediately posted to the eventlog but also added to the global list. When startEventLog is called, the list is traversed and the events reposted.
* rts/linker: Initialise CCSs from native shared objectsBen Gamari2020-11-301-1/+1
|
* Fix typos, via a Levenshtein-style correctorBrian Wignall2020-01-041-1/+1
|
* eventlog: Dump cost centre stack on each sampleMatthew Pickering2019-10-231-1/+19
| | | | | | | | | | | | | | | | With this change it is possible to reconstruct the timing portion of a `.prof` file after the fact. By logging the stacks at each time point a more precise executation trace of the program can be observed rather than all identical cost centres being identified in the report. There are two new events: 1. `EVENT_PROF_BEGIN` - emitted at the start of profiling to communicate the tick interval 2. `EVENT_PROF_SAMPLE_COST_CENTRE` - emitted on each tick to communicate the current call stack. Fixes #17322
* rts: Always truncate output filesBen Gamari2019-08-021-1/+1
| | | | | | | | | Previously there were numerous places in the RTS where we would fopen with the "w" flag string. This is wrong as it will not truncate the file. Consequently if we write less data than the previous length of the file we will leave garbage at its end. Fixes #16993.
* rts: Rename the nondescript initProfiling2 to refreshProfilingCCSsDaniel Gröber2019-07-161-2/+2
|
* rts: Divorce init of Heap profiler from CCS profilerDaniel Gröber2019-07-161-23/+0
| | | | | | | | | Currently initProfiling gets defined by Profiling.c only if PROFILING is defined. Otherwise the ProfHeap.c defines it. This is just needlessly complicated so in this commit I make Profiling and ProfHeap into properly seperate modules and call their respective init functions from RtsStartup.c.
* Documentation and refactoring in CCS related codeÖmer Sinan Ağacan2019-01-121-30/+56
| | | | | | | | | - Remove REGISTER_CC and REGISTER_CCS macros, add functions registerCC and registerCCS to Profiling.c. - Reduce scope of symbols: CC_LIST, CCS_LIST, CC_ID, CCS_ID - Document CC_LIST and CCS_LIST
* Implement a sanity check for CCS fields in profiling buildsÖmer Sinan Ağacan2019-01-101-0/+4
| | | | | This helped me debug one of the bugs in #15508. I'm not sure if this is a good idea, but it worked for me, so wanted to submit this as a MR.
* Minor refactoring and documentation in profiling RTS codeÖmer Sinan Ağacan2019-01-031-41/+20
|
* Remove MAX_PATH restrictions from RTS, I/O manager and various utilitiesTamar Christina2018-03-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This shims out fopen and sopen so that they use modern APIs under the hood along with namespaced paths. This lifts the MAX_PATH restrictions from Haskell programs and makes the new limit ~32k. There are only some slight caveats that have been documented. Some utilities have not been upgraded such as lndir, since all these things are different cabal packages I have been forced to copy the source in different places which is less than ideal. But it's the only way to keep sdist working. Test Plan: ./validate Reviewers: hvr, bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter GHC Trac Issues: #10822 Differential Revision: https://phabricator.haskell.org/D4416
* Speed up compilation of profiling stubsBen Gamari2017-08-161-0/+19
| | | | | | | | | | | | | | | | | | | | Here we encode the cost centre list as static data. This means that the initialization stubs are small functions which should be easy for GCC to compile, even with optimization. Fixes #7960. Test Plan: Test profiling Reviewers: austin, erikd, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie GHC Trac Issues: #7960 Differential Revision: https://phabricator.haskell.org/D3853
* Prefer #if defined to #ifdefBen Gamari2017-04-281-8/+8
| | | | Our new CPP linter enforces this.
* rts: Fix buildBen Gamari2017-02-281-0/+1
| | | | | I evidently neglected to consider that validate doesn't build profiled ways. Arg.
* rts: Allow profile output path to be specified on RTS command lineBen Gamari2017-02-281-16/+23
| | | | | | | | | | | | | | | | | | | | | | This introduces a RTS option, -po, which allows the user to override the stem used to form the output file names of the heap profile and cost center summary. It's a bit unclear to me whether this is really the interface we want. Alternatively we could just allow the user to specify the `.hp` and `.prof` file names separately. This would arguably be a bit more straightforward and would allow the user to name JSON output with an appropriate `.json` suffix if they so desired. However, this would come at the cost of taking more of the option space, which is a somewhat precious commodity. Test Plan: Validate, try using `-po` RTS option Reviewers: simonmar, austin, erikd Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3182
* JSON profiler reportsBen Gamari2017-02-231-1/+23
| | | | | | | | | | | | | | | | | | | This introduces a JSON output format for cost-centre profiler reports. It's not clear whether this is really something we want to introduce given that we may also move to a more Haskell-driven output pipeline in the future, but I nevertheless found this helpful, so I thought I would put it up. Test Plan: Compile a program with `-prof -fprof-auto`; run with `+RTS -pj` Reviewers: austin, erikd, simonmar Reviewed By: simonmar Subscribers: duncan, maoe, thomie, simonmar Differential Revision: https://phabricator.haskell.org/D3132
* rts/Profiling: Factor out report generationBen Gamari2017-02-111-314/+8
| | | | | | | | | | | | | | | | | | | | | Here we move the actual report generation logic to `rts/ProfilerReport.c`. This break is actually quite clean, void writeCCSReport( FILE *prof_file, CostCentreStack const *ccs, ProfilerTotals totals ); This is more profiler refactoring in preparation for machine-readable output. Test Plan: Validate Reviewers: austin, erikd, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3097
* rts/Profiling: Kill a few globals and add constsBen Gamari2017-02-111-50/+63
| | | | | | | | | | | | | | | | | | | Previously it was quite difficult to follow the dataflow through this file due to global mutation and rather non-descriptive types. This is a cleanup in preparation for factoring out the report-generating logic, which is itself in preparation for somedayteaching the profiler to produce more machine-readable reports (JSON perhaps?). Test Plan: Validate Reviewers: austin, erikd, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3096
* More fixes for #5654Simon Marlow2017-01-061-1/+5
| | | | | | | | | | | | | | * In stg_ap_0_fast, if we're evaluating a thunk, the thunk might evaluate to a function in which case we may have to adjust its CCS. * The interpreter has its own implementation of stg_ap_0_fast, so we have to do the same shenanigans with creating empty PAPs and copying PAPs there. * GHCi creates Cost Centres as children of CCS_MAIN, which enterFunCCS() wrongly assumed to imply that they were CAFs. Now we use the is_caf flag for this, which we have to correctly initialise when we create a Cost Centre in GHCi.
* Use C99's boolBen Gamari2016-11-291-18/+10
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Remove CONSTR_STATICSimon Marlow2016-11-141-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We currently have two info tables for a constructor * XXX_con_info: the info table for a heap-resident instance of the constructor, It has type CONSTR, or one of the specialised types like CONSTR_1_0 * XXX_static_info: the info table for a static instance of this constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF. I'm getting rid of the latter, and using the `con_info` info table for both static and dynamic constructors. For rationale and more details see Note [static constructors] in SMRep.hs. I also removed these macros: `isSTATIC()`, `ip_STATIC()`, `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC distinction, and anyway HEAP_ALLOCED() does the same job. Test Plan: validate Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2690 GHC Trac Issues: #12455
* Show sources of cost centers in .profÖmer Sinan Ağacan2016-06-081-18/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the problem with duplicate cost-centre names that was reported a couple of times before. When a module implements a typeclass multiple times for different types, methods of different implementations get same cost-centre names and are reported like this: COST CENTRE MODULE %time %alloc CAF GHC.IO.Handle.FD 0.0 32.8 CAF GHC.Read 0.0 1.0 CAF GHC.IO.Encoding 0.0 1.8 showsPrec Main 0.0 1.2 readPrec Main 0.0 19.4 readPrec Main 0.0 20.5 main Main 0.0 20.2 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 53 0 0.0 0.2 0.0 100.0 CAF Main 105 0 0.0 0.3 0.0 62.5 readPrec Main 109 1 0.0 0.6 0.0 0.6 readPrec Main 107 1 0.0 0.6 0.0 0.6 main Main 106 1 0.0 20.2 0.0 61.0 == Main 114 1 0.0 0.0 0.0 0.0 == Main 113 1 0.0 0.0 0.0 0.0 showsPrec Main 112 2 0.0 1.2 0.0 1.2 showsPrec Main 111 2 0.0 0.9 0.0 0.9 readPrec Main 110 0 0.0 18.8 0.0 18.8 readPrec Main 108 0 0.0 19.9 0.0 19.9 It's not possible to tell from the report which `==` took how long. This patch adds one more column at the cost of making outputs wider. The report now looks like this: COST CENTRE MODULE SRC %time %alloc CAF GHC.IO.Handle.FD <entire-module> 0.0 32.9 CAF GHC.IO.Encoding <entire-module> 0.0 1.8 CAF GHC.Read <entire-module> 0.0 1.0 showsPrec Main Main_1.hs:7:19-22 0.0 1.2 readPrec Main Main_1.hs:7:13-16 0.0 19.5 readPrec Main Main_1.hs:4:13-16 0.0 20.5 main Main Main_1.hs:(10,1)-(20,20) 0.0 20.2 individual inherited COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc MAIN MAIN <built-in> 53 0 0.0 0.2 0.0 100.0 CAF Main <entire-module> 105 0 0.0 0.3 0.0 62.5 readPrec Main Main_1.hs:7:13-16 109 1 0.0 0.6 0.0 0.6 readPrec Main Main_1.hs:4:13-16 107 1 0.0 0.6 0.0 0.6 main Main Main_1.hs:(10,1)-(20,20) 106 1 0.0 20.2 0.0 61.0 == Main Main_1.hs:7:25-26 114 1 0.0 0.0 0.0 0.0 == Main Main_1.hs:4:25-26 113 1 0.0 0.0 0.0 0.0 showsPrec Main Main_1.hs:7:19-22 112 2 0.0 1.2 0.0 1.2 showsPrec Main Main_1.hs:4:19-22 111 2 0.0 0.9 0.0 0.9 readPrec Main Main_1.hs:7:13-16 110 0 0.0 18.8 0.0 18.8 readPrec Main Main_1.hs:4:13-16 108 0 0.0 19.9 0.0 19.9 CAF Text.Read.Lex <entire-module> 102 0 0.0 0.5 0.0 0.5 To fix failing test cases because of different orderings of cost centres (e.g. optimized and non-optimized build printing in different order), with this patch we also start sorting cost centres before printing. The order depends on 1) entries (more entered cost centres come first) 2) names (using strcmp() on cost centre names). Reviewers: simonmar, austin, erikd, bgamari Reviewed By: simonmar, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2282 GHC Trac Issues: #11543, #8473, #7105
* rts: More const correct-ness fixesErik de Castro Lopo2016-05-181-3/+3
| | | | | | | | | | | | | | | | | | | | In addition to more const-correctness fixes this patch fixes an infelicity of the previous const-correctness patch (995cf0f356) which left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter but returning a non-const pointer. Here we restore the original type signature of `UNTAG_CLOSURE` and add a new function `UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure` pointer and uses that wherever possible. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi Reviewed By: simonmar, trofi Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2231
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-25/+26
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* rts: mark 'ccs_mutex' and 'prof_arena' as staticSergei Trofimovich2016-02-071-2/+2
| | | | | | | | | | Noticed by uselex.rb: ccs_mutex: [R]: exported from: ./rts/dist/build/Profiling.thr_p_o prof_arena: [R]: exported from: ./rts/dist/build/Profiling.p_o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* Fix the Windows buildThomas Miedema2016-01-291-1/+1
|
* Fix segmentation fault when .prof file not writeableThomas Miedema2016-01-261-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | There are two ways to do retainer profiling. Quoting from the user's guide: 1. `+RTS -hr` "Breaks down the graph by retainer set" 2. `+RTS -hr<cc> -h<x>`, where `-h<x>` is one of normal heap profiling break-down options (e.g. `-hc`), and `-hr<cc> means "Restrict the profile to closures with retainer sets containing cost-centre stacks with one of the specified cost centres at the top." Retainer profiling writes to a .hp file, like the other heap profiling options, but also to a .prof file. Therefore, when the .prof file is not writeable for whatever reason, retainer profiling should be turned off completely. This worked ok when running the program with `+RTS -hr` (option 1), but a segfault would occur when using `+RTS -hr<cc> -h<x>`, with `x!=r` (option 2). This commit fixes that. Reviewed by: bgamari Differential Revision: https://phabricator.haskell.org/D1849 GHC Trac Issues: #11489
* Maintain cost-centre stacks in the interpreterSimon Marlow2015-12-211-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Breakpoints become SCCs, so we have detailed call-stack info for interpreted code. Currently this only works when GHC is compiled with -prof, but D1562 (Remote GHCi) removes this constraint so that in the future call stacks will be available without building your own GHCi. How can you get a stack trace? * programmatically: GHC.Stack.currentCallStack * I've added an experimental :where command that shows the stack when stopped at a breakpoint * `error` attaches a call stack automatically, although since calls to `error` are often lifted out to the top level, this is less useful than it might be (ImplicitParams still works though). * Later we might attach call stacks to all exceptions Other related changes in this diff: * I reduced the number of places that get ticks attached for breakpoints. In particular there was a breakpoint around the whole declaration, which was often redundant because it bound no variables. This reduces clutter in the stack traces and speeds up compilation. * I tidied up some RealSrcSpan stuff in InteractiveUI, and made a few other small cleanups Test Plan: validate Reviewers: ezyang, bgamari, austin, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1595 GHC Trac Issues: #11047
* Make GHCi & TH work when the compiler is built with -profSimon Marlow2015-11-071-22/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Amazingly, there were zero changes to the byte code generator and very few changes to the interpreter - mainly because we've used good abstractions that hide the differences between profiling and non-profiling. So that bit was pleasantly straightforward, but there were a pile of other wibbles to get the whole test suite through. Note that a compiler built with -prof is now like one built with -dynamic, in that to use TH you have to build the code the same way. For dynamic, we automatically enable -dynamic-too when TH is required, but we don't have anything equivalent for profiling, so you have to explicitly use -prof when building code that uses TH with a profiled compiler. For this reason Cabal won't work with TH. We don't expect to ship a profiled compiler, so I think that's OK. Test Plan: validate with GhcProfiled=YES in validate.mk Reviewers: goldfire, bgamari, rwbarton, austin, hvr, erikd, ezyang Reviewed By: ezyang Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1407 GHC Trac Issues: #4837, #545
* Stop profiling output from running together (#8811)Dave Laing2015-04-061-20/+54
| | | | | | Reviewed By: thomie Differential Revision: https://phabricator.haskell.org/D779
* [skip ci] rts: Detabify Profiling.cAustin Seipp2014-10-211-49/+49
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* remove redundant condition checking in profiling RTS codeosa12014-07-021-4/+2
| | | | | | | | | | | | | | Summary: A redundant condition checking is removed, as discussed in http://www.haskell.org/pipermail/ghc-devs/2014-June/005088.html Test Plan: validate Reviewers: simonmar, austin Reviewed By: austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D37
* Remove deprecated _scc_ (#8170)Krzysztof Gogolewski2013-10-051-1/+1
|
* Don't move Capabilities in setNumCapabilities (#8209)Simon Marlow2013-09-041-1/+1
| | | | | | | | | | | | | We have various problems with reallocating the array of Capabilities, due to threads in waitForReturnCapability that are already holding a pointer to a Capability. Rather than add more locking to make this safer, I decided it would be easier to ensure that we never move the Capabilities at all. The capabilities array is now an array of pointers to Capabaility. There are extra indirections, but it rarely matters - we don't often access Capabilities via the array, normally we already have a pointer to one. I ran the parallel benchmarks and didn't see any difference.
* fprintCCS_stderr: untag the exception (#7319)Simon Marlow2012-10-251-1/+1
|
* Fix a silly bug that would cause -xc to print less than useful informationSimon Marlow2012-10-231-1/+3
|
* Deprecate lnat, and use StgWord insteadSimon Marlow2012-09-071-1/+1
| | | | | | | | | | | | lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
* Profiling: open .prof when -hr<cc> is specifiedTakano Akio2012-08-201-1/+2
| | | | | The code for retainer profiling is used with e.g. +RTS -hc -hrfoo -RTS, as well as with +RTS -hr -RTS.
* Profiling: don't report IDLE time by defaultSimon Marlow2012-07-111-2/+4
| | | | | You can get it with +RTS -P, as with the other systemish cost centres like "GC".
* More changes aimed at improving call stacks.Simon Marlow2011-12-021-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | - Attach a SrcSpan to every CostCentre. This had the side effect that CostCentres that used to be merged because they had the same name are now considered distinct; so I had to add a Unique to CostCentre to give them distinct object-code symbols. - New flag: -fprof-auto-calls. This flag adds an automatic SCC to every call site (application, to be precise). This is typically more useful for call stacks than annotating whole functions. Various tidy-ups at the same time: removed unused NoCostCentre constructor, and refactored a bit in Coverage.lhs. The call stack we get from traceStack now looks like this: Stack trace: Main.CAF (<entire-module>) Main.main.xs (callstack002.hs:18:12-24) Main.map (callstack002.hs:13:12-16) Main.map.go (callstack002.hs:15:21-34) Main.map.go (callstack002.hs:15:21-23) Main.f (callstack002.hs:10:7-43)
* Forgot an initMutex(); fixes profthreaded failures on WindowsSimon Marlow2011-12-011-0/+4
|
* Make profiling work with multiple capabilities (+RTS -N)Simon Marlow2011-11-291-45/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
* Time handling overhaulSimon Marlow2011-11-251-4/+4
| | | | | | | | | | | | | | | | | | | | | Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
* +RTS -xc: print a the closure type of the exception tooSimon Marlow2011-11-141-2/+22
|
* get the column widths right for Unicode SCC labels/modulesSimon Marlow2011-11-081-7/+29
|
* Overhaul of infrastructure for profiling, coverage (HPC) and breakpointsSimon Marlow2011-11-021-574/+626
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)
* Change the way module initialisation is done (#3252, #4417)Simon Marlow2011-04-121-40/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the code generator generated small code fragments labelled with __stginit_M for each module M, and these performed whatever initialisation was necessary for that module and recursively invoked the initialisation functions for imported modules. This appraoch had drawbacks: - FFI users had to call hs_add_root() to ensure the correct initialisation routines were called. This is a non-standard, and ugly, API. - unless we were using -split-objs, the __stginit dependencies would entail linking the whole transitive closure of modules imported, whether they were actually used or not. In an extreme case (#4387, #4417), a module from GHC might be imported for use in Template Haskell or an annotation, and that would force the whole of GHC to be needlessly linked into the final executable. So now instead we do our initialisation with C functions marked with __attribute__((constructor)), which are automatically invoked at program startup time (or DSO load-time). The C initialisers are emitted into the stub.c file. This means that every time we compile with -prof or -hpc, we now get a stub file, but thanks to #3687 that is now invisible to the user. There are some refactorings in the RTS (particularly for HPC) to handle the fact that initialisers now get run earlier than they did before. The __stginit symbols are still generated, and the hs_add_root() function still exists (but does nothing), for backwards compatibility.