summaryrefslogtreecommitdiff
path: root/rts/Trace.h
Commit message (Collapse)AuthorAgeFilesLines
* rts: Flush eventlog buffers from flushEventLogBen Gamari2020-11-241-1/+0
| | | | | | | | | | | | As noted in #18043, flushTrace failed flush anything beyond the writer. This means that a significant amount of data sitting in capability-local event buffers may never get flushed, despite the users' pleads for us to flush. Fix this by making flushEventLog flush all of the event buffers before flushing the writer. Fixes #18043.
* Modules (#13009)Sylvain Henry2020-04-181-1/+1
| | | | | | | | | | | | | | * SysTools * Parser * GHC.Builtin * GHC.Iface.Recomp * Settings Update Haddock submodule Metric Decrease: Naperian parsing001
* Merge non-moving garbage collectorBen Gamari2019-10-231-0/+22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a concurrent mark & sweep garbage collector to manage the old generation. The concurrent nature of this collector typically results in significantly reduced maximum and mean pause times in applications with large working sets. Due to the large and intricate nature of the change I have opted to preserve the fully-buildable history, including merge commits, which is described in the "Branch overview" section below. Collector design ================ The full design of the collector implemented here is described in detail in a technical note > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell > Compiler" (2018) This document can be requested from @bgamari. The basic heap structure used in this design is heavily inspired by > K. Ueno & A. Ohori. "A fully concurrent garbage collector for > functional programs on multicore processors." /ACM SIGPLAN Notices/ > Vol. 51. No. 9 (presented at ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). Implementation structure ======================== The majority of the collector is implemented in a handful of files: * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point to the nonmoving collector (`nonmoving_collect`), as well as the allocator (`nonmoving_allocate`) and a number of utilities for manipulating the heap. * `rts/NonmovingMark.c` implements the mark queue functionality, update remembered set, and mark loop. * `rts/NonmovingSweep.c` implements the sweep loop. * `rts/NonmovingScav.c` implements the logic necessary to scavenge the nonmoving heap. Branch overview =============== ``` * wip/gc/opt-pause: | A variety of small optimisations to further reduce pause times. | * wip/gc/compact-nfdata: | Introduce support for compact regions into the non-moving |\ collector | \ | \ | | * wip/gc/segment-header-to-bdescr: | | | Another optimization that we are considering, pushing | | | some segment metadata into the segment descriptor for | | | the sake of locality during mark | | | | * | wip/gc/shortcutting: | | | Support for indirection shortcutting and the selector optimization | | | in the non-moving heap. | | | * | | wip/gc/docs: | |/ Work on implementation documentation. | / |/ * wip/gc/everything: | A roll-up of everything below. |\ | \ | |\ | | \ | | * wip/gc/optimize: | | | A variety of optimizations, primarily to the mark loop. | | | Some of these are microoptimizations but a few are quite | | | significant. In particular, the prefetch patches have | | | produced a nontrivial improvement in mark performance. | | | | | * wip/gc/aging: | | | Enable support for aging in major collections. | | | | * | wip/gc/test: | | | Fix up the testsuite to more or less pass. | | | * | | wip/gc/instrumentation: | | | A variety of runtime instrumentation including statistics | | / support, the nonmoving census, and eventlog support. | |/ | / |/ * wip/gc/nonmoving-concurrent: | The concurrent write barriers. | * wip/gc/nonmoving-nonconcurrent: | The nonmoving collector without the write barriers necessary | for concurrent collection. | * wip/gc/preparation: | A merge of the various preparatory patches that aren't directly | implementing the GC. | | * GHC HEAD . . . ```
| * NonmovingCensus: Emit samples to eventlogwip/gc/instrumentationBen Gamari2019-10-221-0/+5
| |
| * rts: Tracing support for nonmoving collection eventsBen Gamari2019-10-221-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a few events to mark key points in the nonmoving garbage collection cycle. These include: * `EVENT_CONC_MARK_BEGIN`, denoting the beginning of a round of marking. This may happen more than once in a single major collection since we the major collector iterates until it hits a fixed point. * `EVENT_CONC_MARK_END`, denoting the end of a round of marking. * `EVENT_CONC_SYNC_BEGIN`, denoting the beginning of the post-mark synchronization phase * `EVENT_CONC_UPD_REM_SET_FLUSH`, indicating that a capability has flushed its update remembered set. * `EVENT_CONC_SYNC_END`, denoting that all mutators have flushed their update remembered sets. * `EVENT_CONC_SWEEP_BEGIN`, denoting the beginning of the sweep portion of the major collection. * `EVENT_CONC_SWEEP_END`, denoting the end of the sweep portion of the major collection.
| * rts: Introduce debug flag for non-moving GCBen Gamari2019-10-201-0/+1
| |
* | eventlog: Dump cost centre stack on each sampleMatthew Pickering2019-10-231-0/+4
|/ | | | | | | | | | | | | | | | With this change it is possible to reconstruct the timing portion of a `.prof` file after the fact. By logging the stacks at each time point a more precise executation trace of the program can be observed rather than all identical cost centres being identified in the report. There are two new events: 1. `EVENT_PROF_BEGIN` - emitted at the start of profiling to communicate the tick interval 2. `EVENT_PROF_SAMPLE_COST_CENTRE` - emitted on each tick to communicate the current call stack. Fixes #17322
* Add new debug flag -DZTobias Guggenmos2019-10-031-0/+1
| | | | Zeros heap memory after gc freed it.
* eventlog: Add biographical and retainer profiling tracesMatthew Pickering2019-09-171-0/+2
| | | | | | | | | | This patch adds a new eventlog event which indicates the start of a biographical profiler sample. These are different to normal events as they also include the timestamp of when the census took place. This is because the LDV profiler only emits samples at the end of the run. Now all the different profiling modes emit consumable events to the eventlog.
* Add HEAP_PROF_SAMPLE_END event to mark end of samplesMatthew Pickering2019-06-071-0/+2
| | | | | | | This allows a user to observe how long a sampling period lasts so that the time taken can be removed from the profiling output. Fixes #16697
* Add traceBinaryEvent# primopMitsutoshi Aoe2018-08-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new primop called traceBinaryEvent# that takes the length of binary data and a pointer to the data, then emits it to the eventlog. There is some example code that uses this primop and the new event: * [traceBinaryEventIO][1] that calls `traceBinaryEvent#` * [A patch to ghc-events][2] that parses the new `EVENT_USER_BINARY_MSG` There's no corresponding issue on Trac but it was discussed at ghc-devs [3]. [1] https://github.com/maoe/ghc-trace-events/blob /fb226011ef1f85a97b4da7cc9d5f98f9fe6316ae/src/Debug/Trace/Binary.hs#L29) [2] https://github.com/maoe/ghc-events/commit /239ca77c24d18cdd10d6d85a0aef98e4a7c56ae6) [3] https://mail.haskell.org/pipermail/ghc-devs/2018-May/015791.html Reviewers: bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5007
* rts: Flush eventlog in hs_init_ghc (fixes #15440)Mitsutoshi Aoe2018-07-271-0/+4
| | | | | | | | | | | | | | | Without this change RTS typically doesn't flush some important events until the process terminates or it doesn't write them at all in case it terminates abnormally. Here is a list of such events: * EVENT_WALL_CLOCK_TIME * EVENT_OS_PROCESS_PID * EVENT_OS_PROCESS_PPID * EVENT_RTS_IDENTIFIER * EVENT_PROGRAM_ARGS * EVENT_PROGRAM_ENV
* Fix missing escape in macroMoritz Angermann2017-07-121-1/+1
| | | | | | | | | | Reviewers: angerman, austin, bgamari, erikd, simonmar Reviewed By: angerman Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3727
* Fix Work Balance computation in RTS statsDouglas Wilson2017-07-111-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | An additional stat is tracked per gc: par_balanced_copied This is the the number of bytes copied by each gc thread under the balanced lmit, which is simply (copied_bytes / num_gc_threads). The stat is added to all the appropriate GC structures, so is visible in the eventlog and in GHC.Stats. A note is added explaining how work balance is computed. Remove some end of line whitespace Test Plan: ./validate experiment with the program attached to the ticket examine code changes carefully Reviewers: simonmar, austin, hvr, bgamari, erikd Reviewed By: simonmar Subscribers: Phyx, rwbarton, thomie GHC Trac Issues: #13830 Differential Revision: https://phabricator.haskell.org/D3658
* Prefer #if defined to #ifdefBen Gamari2017-04-281-6/+6
| | | | Our new CPP linter enforces this.
* cpp: Use #pragma once instead of #ifndef guardsBen Gamari2017-04-231-4/+1
| | | | | | | | | | | | | | This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
* Overhaul of Compact Regions (#12455)Simon Marlow2016-12-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit makes various improvements and addresses some issues with Compact Regions (aka Compact Normal Forms). This was the most important thing I wanted to fix. Compaction previously prevented GC from running until it was complete, which would be a problem in a multicore setting. Now, we compact using a hand-written Cmm routine that can be interrupted at any point. When a GC is triggered during a sharing-enabled compaction, the GC has to traverse and update the hash table, so this hash table is now stored in the StgCompactNFData object. Previously, compaction consisted of a deepseq using the NFData class, followed by a traversal in C code to copy the data. This is now done in a single pass with hand-written Cmm (see rts/Compact.cmm). We no longer use the NFData instances, instead the Cmm routine evaluates components directly as it compacts. The new compaction is about 50% faster than the old one with no sharing, and a little faster on average with sharing (the cost of the hash table dominates when we're doing sharing). Static objects that don't (transitively) refer to any CAFs don't need to be copied into the compact region. In particular this means we often avoid copying Char values and small Int values, because these are static closures in the runtime. Each Compact# object can support a single compactAdd# operation at any given time, so the Data.Compact library now enforces mutual exclusion using an MVar stored in the Compact object. We now get exceptions rather than killing everything with a barf() when we encounter an object that cannot be compacted (a function, or a mutable object). We now also detect pinned objects, which can't be compacted either. The Data.Compact API has been refactored and cleaned up. A new compactSize operation returns the size (in bytes) of the compact object. Most of the documentation is in the Haddock docs for the compact library, which I've expanded and improved here. Various comments in the code have been improved, especially the main Note [Compact Normal Forms] in rts/sm/CNF.c. I've added a few tests, and expanded a few of the tests that were there. We now also run the tests with GHCi, and in a new test way that enables sanity checking (+RTS -DS). There's a benchmark in libraries/compact/tests/compact_bench.hs for measuring compaction speed and comparing sharing vs. no sharing. The field totalDataW in StgCompactNFData was unnecessary. Test Plan: * new unit tests * validate * tested manually that we can compact Data.Aeson data Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd Subscribers: thomie, simonpj Differential Revision: https://phabricator.haskell.org/D2751 GHC Trac Issues: #12455
* Remove the DEBUG_<blah> variables, use RtsFlags directlySimon Marlow2016-08-031-17/+17
|
* Only trace cap/capset events if we're tracing anything elseSimon Marlow2016-08-031-10/+15
| | | | | | | | | | | | | | | | Summary: I was getting annoyed by cap/capset messages when using +RTS -DS, which doesn't cause any other trace messages to be emitted. This makes it possible to add --with-rtsopts=-DS when running tests, and not have all the tests fail due to spurious trace messages. Test Plan: validate Reviewers: duncan, bgamari, ezyang, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2438
* Log heap profiler samples to event logBen Gamari2016-07-161-0/+19
| | | | | | | | | | | | Test Plan: Try it Reviewers: hvr, simonmar, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1722 GHC Trac Issues: #11094
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-12/+12
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* rts: drop unused 'traceEventThreadRunnable'Sergei Trofimovich2016-02-071-11/+0
| | | | | | | | | | | | | | | | Not used since: commit f361281c89fbce42865d8b8b27b0957205366186 Author: Simon Marlow <marlowsd@gmail.com> Date: Wed Dec 7 11:32:35 2011 +0000 Do not emit the THREAD_RUNNABLE event; it has no useful semantic content Noticed by uselex.rb: traceEventThreadRunnable: [R]: exported from: ./rts/dist/build/Inlines.o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* rts: Clean up whitespace in Trace.hBen Gamari2015-09-261-24/+24
|
* tracing: Kill EVENT_STARTUPBen Gamari2015-09-051-24/+0
| | | | | This has been unnecessary for quite some time due to the create/delete capability events.
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Add a new traceMarker# primop for use in profiling outputDuncan Coutts2012-10-151-0/+10
| | | | | | | | | In time-based profiling visualisations (e.g. heap profiles and ThreadScope) it would be useful to be able to mark particular points in the execution and have those points in time marked in the visualisation. The traceMarker# primop currently emits an event into the eventlog. In principle it could be extended to do something in the heap profiling too.
* Deprecate lnat, and use StgWord insteadSimon Marlow2012-09-071-22/+22
| | | | | | | | | | | | lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
* Fix build on OS XIan Lynagh2012-07-151-2/+2
|
* Fix dtraceTaskCreateIan Lynagh2012-07-151-4/+6
| | | | The tid argument was missing
* Fix a CPP typoIan Lynagh2012-07-141-1/+1
|
* Have a go at fixing the heap info DTrace build failue on OSXDuncan Coutts2012-07-101-19/+0
| | | | | | | | This patch will need to be tested by someone on OSX. Fixed a couple wrong names: CapsetID vs EventCapsetID gc__sync vs gc__global__sync
* Define the task-tracking eventsDuncan Coutts2012-07-101-0/+62
| | | | | | | | | | | | | Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com> These new eventlog events are to let profiling tools keep track of all the OS threads that belong to an RTS capability at any moment in time. In the RTS, OS threads correspond to the Task abstraction, so that is what we track. There are events for tasks being created, migrated between capabilities and deleted. In particular the task creation event also records the kernel thread id which lets us match up the OS thread with data collected by others tools (in the initial use case with Linux's perf tool, but in principle also with DTrace).
* Fix RTS build on OS XManuel M T Chakravarty2012-04-101-2/+20
| | | | | | * The following commits made validate fail on OS X (Lion): 65aaa9b2715c5245838123f3a0fa5d92e0a66bce and c294d95dc04950ab4c5380bf6ce8651f621f8591 * I just commented out all offending code until it validated again. The original authors need to clean this up.
* Add the GC_GLOBAL_SYNC event marking that all caps are stopped for GCMikolaj2012-04-041-0/+9
| | | | | | | | | Quoting design rationale by dcoutts: The event indicates that we're doing a stop-the-world GC and all other HECs should be between their GC_START and GC_END events at that moment. We don't want to use GC_STATS_GHC for that, because GC_STATS_GHC is for extra GHC-specific info, not something we have to rely on to be able to match the GC pauses across HECs to a particular global GC.
* Fix the timestamps in GC_START and GC_END events on the GC-initiating capMikolaj2012-04-041-0/+25
| | | | | | | | | | | There was a discrepancy between GC times reported in +RTS -s and the timestamps of GC_START and GC_END events on the cap, on which +RTS -s stats for the given GC are based. This is fixed by posting the events with exactly the same timestamp as generated for the stat calculation. The calls posting the events are moved too, so that the events are emitted close to the time instant they claim to be emitted at. The GC_STATS_GHC was moved, too, ensuring it's emitted before the moved GC_END on all caps, which simplifies tools code.
* Add new eventlog events for various heap and GC statisticsDuncan Coutts2012-04-041-2/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They cover much the same info as is available via the GHC.Stats module or via the '+RTS -s' textual output, but via the eventlog and with a better sampling frequency. We have three new generic heap info events and two very GHC-specific ones. (The hope is the general ones are usable by other implementations that use the same eventlog system, or indeed not so sensitive to changes in GHC itself.) The general ones are: * total heap mem allocated since prog start, on a per-HEC basis * current size of the heap (MBlocks reserved from OS for the heap) * current size of live data in the heap Currently these are all emitted by GHC at GC time (live data only at major GC). The GHC specific ones are: * an event giving various static heap paramaters: * number of generations (usually 2) * max size if any * nursary size * MBlock and block sizes * a event emitted on each GC containing: * GC generation (usually just 0,1) * total bytes copied * bytes lost to heap slop and fragmentation * the number of threads in the parallel GC (1 for serial) * the maximum number of bytes copied by any par GC thread * the total number of bytes copied by all par GC threads (these last three can be used to calculate an estimate of the work balance in parallel GCs)
* Add eventlog/trace stuff for capabilities: create/delete/enable/disableDuncan Coutts2012-04-041-19/+50
| | | | | | | | | | | | | | | | | | | | | | | Now that we can adjust the number of capabilities on the fly, we need this reflected in the eventlog. Previously the eventlog had a single startup event that declared a static number of capabilities. Obviously that's no good anymore. For compatability we're keeping the EVENT_STARTUP but adding new EVENT_CAP_CREATE/DELETE. The EVENT_CAP_DELETE is actually just the old EVENT_SHUTDOWN but renamed and extended (using the existing mechanism to extend eventlog events in a compatible way). So we now emit both EVENT_STARTUP and EVENT_CAP_CREATE. One day we will drop EVENT_STARTUP. Since reducing the number of capabilities at runtime does not really delete them, it just disables them, then we also have new events for disable/enable. The old EVENT_SHUTDOWN was in the scheduler class of events. The new EVENT_CAP_* events are in the unconditional class, along with the EVENT_CAPSET_* ones. Knowing when capabilities are created and deleted is crucial to making sense of eventlogs, you always want those events. In any case, they're extremely low volume.
* Allow the number of capabilities to be increased at runtime (#3729)Simon Marlow2011-12-061-0/+1
| | | | | At present the number of capabilities can only be *increased*, not decreased. The latter presents a few more challenges!
* Add eventlog event for thread labelsDuncan Coutts2011-11-041-0/+23
| | | | | | The existing GHC.Conc.labelThread will now also emit the the thread label into the eventlog. Profiling tools like ThreadScope could then use the thread labels rather than thread numbers.
* Add an RTS eventlog tracing class for user messagesDuncan Coutts2011-10-271-0/+1
| | | | Enables people to turn them on/off. Defaults to on.
* Add new eventlog EVENT_WALL_CLOCK_TIME for time matchingDuncan Coutts2011-10-261-0/+10
| | | | | | | | | | | | | | Eventlog timestamps are elapsed times (in nanoseconds) relative to the process start. To be able to merge eventlogs from multiple processes we need to be able to align their timelines. If they share a clock domain (or a user judges that their clocks are sufficiently closely synchronised) then it is sufficient to know how the eventlog timestamps match up with the clock. The EVENT_WALL_CLOCK_TIME contains the clock time with (up to) nanosecond precision. It is otherwise an ordinary event and so contains the usual timestamp for the same moment in time. It therefore enables us to match up all the eventlog timestamps with clock time.
* Rename traceCapsetModify for consistency and clarityDuncan Coutts2011-10-261-8/+8
|
* Fix build on OSX, thanks to David PeixottoBen Lippmeier2011-07-221-1/+1
|
* Add new fully-accurate per-spark trace/eventlog eventsDuncan Coutts2011-07-181-38/+90
| | | | | | | | | | | | | | Replaces the existing EVENT_RUN/STEAL_SPARK events with 7 new events covering all stages of the spark lifcycle: create, dud, overflow, run, steal, fizzle, gc The sampled spark events are still available. There are now two event classes for sparks, the sampled and the fully accurate. They can be enabled/disabled independently. By default +RTS -l includes the sampled but not full detail spark events. Use +RTS -lf-p to enable the detailed 'f' and disable the sampled 'p' spark. Includes work by Mikolaj <mikolaj.konarski@gmail.com>
* Move GC tracing into a separate trace classDuncan Coutts2011-07-181-30/+31
| | | | | | | Previously GC was included in the scheduler trace class. It can be enabled specifically with +RTS -vg or -lg, though note that both -v and -l on their own now default to a sensible set of trace classes, currently: scheduler, gc and sparks.
* include spark counters in the sparks trace classMikolaj2011-07-181-1/+1
|
* add a new trace class for spark eventsDuncan Coutts2011-07-181-3/+10
|
* Add spark counter tracingDuncan Coutts2011-07-181-16/+42
| | | | | | | A new eventlog event containing 7 spark counters/statistics: sparks created, dud, overflowed, converted, GC'd, fizzled and remaining. These are maintained and logged separately for each capability. We log them at startup, on each GC (minor and major) and on shutdown.
* Fix build on OS X: Correct silly errors in Trace.hIan Lynagh2011-06-271-3/+3
|