summaryrefslogtreecommitdiff
path: root/includes
Commit message (Collapse)AuthorAgeFilesLines
* use (GHC) idiomatic typesGabor Greif2012-01-091-4/+4
|
* Make the RTS linker API use wide-char pathnames on Windows (#5697)Simon Marlow2012-01-091-6/+12
| | | | | I haven't been able to test whether this works or not due to #5754, but at least it doesn't appear to break anything.
* RefactoringIan Lynagh2012-01-081-5/+3
| | | | | This is working towards being able to put ghcautoconf.h and ghcplatform.h in includes/dist
* setNumCapabilities: don't barf() if it isn't supported, just print an errorSimon Marlow2012-01-061-4/+0
|
* abstract away from the 'build-toolchain'-dependent sizeof(...) operatorGabor Greif2012-01-061-10/+14
| | | | | | | | | | The sizes obtained this way do not work on a target system in general. So in a future cross-compilable setup we need another way of obtaining expansions for the macros OFFSET, FIELD_SIZE and TYPE_SIZE. Guarded against accidental use of 'sizeof' by poisoning. Verified that the generated *Constants.h/hs files are unchanged.
* Give the correct type to CCCSSimon Marlow2012-01-051-1/+1
| | | | Needed by #5357
* Rename struct _CostCentreStack to struct CostCentreStack_ for consistencySimon Marlow2012-01-052-9/+9
| | | | Needed by #5357
* Rename the CCCS field of StgTSO so as not to conflict with the CCCS ↵Simon Marlow2012-01-052-2/+2
| | | | | | pseudo-register Needed by #5357
* Fix the C backend after making CCCS an STG registerSimon Marlow2012-01-031-0/+6
|
* Fix alignment in the CostCentre struct (#5710)Simon Marlow2011-12-191-1/+1
|
* New flag +RTS -qi<n>, avoid waking up idle Capabilities to do parallel GCSimon Marlow2011-12-131-0/+8
| | | | | | | | | | | | | | | | | This is an experimental tweak to the parallel GC that avoids waking up a Capability to do parallel GC if we know that the capability has been idle for a (tunable) number of GC cycles. The idea is that if you're only using a few Capabilities, there's no point waking up the ones that aren't busy. e.g. +RTS -qi3 says "A Capability will participate in parallel GC if it was running at all since the last 3 GC cycles." Results are a bit hit and miss, and I don't completely understand why yet. Hence, for now it is turned off by default, and also not documented except in the +RTS -? output.
* Define getNumberOfProcessors() even when !THREADED_RTSSimon Marlow2011-12-071-4/+7
|
* Add new primtypes 'ArrayArray#' and 'MutableArrayArray#'Manuel M T Chakravarty2011-12-071-0/+1
| | | | | | | | The primitive array types, such as 'ByteArray#', have kind #, but are represented by pointers. They are boxed, but unpointed types (i.e., they cannot be 'undefined'). The two categories of array types —[Mutable]Array# and [Mutable]ByteArray#— are containers for unboxed (and unpointed) as well as for boxed and pointed types. So far, we lacked support for containers for boxed, unpointed types (i.e., containers for the primitive arrays themselves). This is what the new primtypes provide. Containers for boxed, unpointed types are crucial for the efficient implementation of scattered nested arrays, which are central to the new DPH backend library dph-lifted-vseg. Without such containers, we cannot eliminate all unboxing from the inner loops of traversals processing scattered nested arrays.
* Allow the number of capabilities to be increased at runtime (#3729)Simon Marlow2011-12-061-0/+10
| | | | | At present the number of capabilities can only be *increased*, not decreased. The latter presents a few more challenges!
* Make forkProcess work with +RTS -NSimon Marlow2011-12-062-20/+33
| | | | | | | | | | | | | | | | | | | | | | Consider this experimental for the time being. There are a lot of things that could go wrong, but I've verified that at least it works on the test cases we have. I also did some API cleanups while I was here. Previously we had: Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret); but this API is particularly error-prone: if you forget to discard the Capability * you passed in and use the return value instead, then you're in for subtle bugs with +RTS -N later on. So I changed all these functions to this form: void rts_eval (/* inout */ Capability **cap, /* in */ HaskellObj p, /* out */ HaskellObj *ret) It's much harder to use this version incorrectly, because you have to pass the Capability in by reference.
* Merge branch 'master' of http://darcs.haskell.org/ghcIan Lynagh2011-12-021-1/+3
|\
| * More changes aimed at improving call stacks.Simon Marlow2011-12-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Attach a SrcSpan to every CostCentre. This had the side effect that CostCentres that used to be merged because they had the same name are now considered distinct; so I had to add a Unique to CostCentre to give them distinct object-code symbols. - New flag: -fprof-auto-calls. This flag adds an automatic SCC to every call site (application, to be precise). This is typically more useful for call stacks than annotating whole functions. Various tidy-ups at the same time: removed unused NoCostCentre constructor, and refactored a bit in Coverage.lhs. The call stack we get from traceStack now looks like this: Stack trace: Main.CAF (<entire-module>) Main.main.xs (callstack002.hs:18:12-24) Main.map (callstack002.hs:13:12-16) Main.map.go (callstack002.hs:15:21-34) Main.map.go (callstack002.hs:15:21-23) Main.f (callstack002.hs:10:7-43)
* | Fix header installationIan Lynagh2011-12-021-1/+1
| |
* | Move includes/DerivedConstants.h and includes/GHCConstants.h into dist dirsIan Lynagh2011-12-022-7/+10
|/ | | | | | | | | | When they existed, they were getting included in the includes_H_FILES variable (as it uses wildcard to find all header files). But the .depends files for the programs that generate the headers depend on $(includes_H_FILES), so the .depends files looked out-of-date once the headers had been created. This caused unnecessary make reinvocations. So now we put them in dist* directories, where they ought to be anyway.
* Fix a scheduling bug in the threaded RTSSimon Marlow2011-12-011-0/+1
| | | | | | | | | | | | | | | The parallel GC was using setContextSwitches() to stop all the other threads, which sets the context_switch flag on every Capability. That had the side effect of causing every Capability to also switch threads, and since GCs can be much more frequent than context switches, this increased the context switch frequency. When context switches are expensive (because the switch is between two bound threads or a bound and unbound thread), the difference is quite noticeable. The fix is to have a separate flag to indicate that a Capability should stop and return to the scheduler, but not switch threads. I've called this the "interrupt" flag.
* Make profiling work with multiple capabilities (+RTS -N)Simon Marlow2011-11-296-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
* Time handling overhaulSimon Marlow2011-11-253-6/+51
| | | | | | | | | | | | | | | | | | | | | Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
* Remove registerised code for dead architectures: mips, ia64, alpha,David Terei2011-11-222-418/+0
| | | | hppa1, m68k
* Tabs -> SpacesDavid Terei2011-11-221-43/+43
|
* Remove some old comments about the manglerDavid Terei2011-11-221-5/+0
|
* mergeSimon Marlow2011-11-222-1/+8
|\
| * Add autoconf support to detect an LLVM-based C compilerDavid M Peixotto2011-10-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to the autoconf scripts to detect when we are using a C compiler that uses an LLVM back end. An LLVM back end does not support all of the extensions use by GCC, so we need to perform some conditional compilation in the runtime, particularly for handling thread local storage and global register variables. The changes here will set the CC_LLVM_BACKEND in the autoconf scripts if we detect an llvm-based compiler. We use this variable to define the llvm_CC_FLAVOR variable that we can use in the runtime code to conditionally compile for LLVM.
| * Enable pthread_getspecific() tls for LLVM compilerDavid M Peixotto2011-10-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | LLVM does not support the __thread attribute for thread local storage and may generate incorrect code for global register variables. We want to allow building the runtime with LLVM-based compilers such as llvm-gcc and clang, particularly for MacOS. This patch changes the gct variable used by the garbage collector to use pthread_getspecific() for thread local storage when an llvm based compiler is used to build the runtime.
* | Improve the way we call "rm" in the build system; fixes trac #4916Ian Lynagh2011-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | We avoid calling "rm -rf" with no file arguments; this fixes cleaning on Solaris, where that fails. We also check for suspicious arguments: anything containing "..", starting "/", or containing a "*" (you need to call $(wildcard ...) yourself now if you really want globbing). This should make things a little safer.
* | Generate the C main() function when linking a binary (fixes #5373)Simon Marlow2011-11-164-25/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than have main() be statically compiled as part of the RTS, we now generate it into the tiny C file that we compile when linking a binary. The main motivation is that we want to pass the settings for the -rtsotps and -with-rtsopts flags into the RTS, rather than relying on fragile linking semantics to override the defaults, which don't work with DLLs on Windows (#5373). In order to do this, we need to extend the API for initialising the RTS, so now we have: void hs_init_ghc (int *argc, char **argv[], // program arguments RtsConfig rts_config); // RTS configuration hs_init_ghc() can optionally be used instead of hs_init(), and allows passing in configuration options for the RTS. RtsConfig is a struct, which currently has two fields: typedef struct { RtsOptsEnabledEnum rts_opts_enabled; const char *rts_opts; } RtsConfig; but might have more in the future. There is a default value for the struct, defaultRtsConfig, the idea being that you start with this and override individual fields as necessary. In fact, main() was in a separate static library, libHSrtsmain.a. That's now gone.
* | Allow the use of R9 and R10 in primops; fixes trac #5423Ian Lynagh2011-11-063-2/+12
| |
* | Add eventlog event for thread labelsDuncan Coutts2011-11-041-3/+3
| | | | | | | | | | | | The existing GHC.Conc.labelThread will now also emit the the thread label into the eventlog. Profiling tools like ThreadScope could then use the thread labels rather than thread numbers.
* | Overhaul of infrastructure for profiling, coverage (HPC) and breakpointsSimon Marlow2011-11-022-94/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)
* | Add an RTS eventlog tracing class for user messagesDuncan Coutts2011-10-271-0/+1
| | | | | | | | Enables people to turn them on/off. Defaults to on.
* | Add new eventlog EVENT_WALL_CLOCK_TIME for time matchingDuncan Coutts2011-10-261-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Eventlog timestamps are elapsed times (in nanoseconds) relative to the process start. To be able to merge eventlogs from multiple processes we need to be able to align their timelines. If they share a clock domain (or a user judges that their clocks are sufficiently closely synchronised) then it is sufficient to know how the eventlog timestamps match up with the clock. The EVENT_WALL_CLOCK_TIME contains the clock time with (up to) nanosecond precision. It is otherwise an ordinary event and so contains the usual timestamp for the same moment in time. It therefore enables us to match up all the eventlog timestamps with clock time.
* | Merge branch 'master' of http://darcs.haskell.org/ghcIan Lynagh2011-10-191-2/+1
|\ \ | | | | | | | | | | | | Conflicts: compiler/utils/Platform.hs
| * | Revert "Move freeStablePtr() into the exported API (Lennart wants it)"Simon Marlow2011-10-191-2/+1
| | | | | | | | | | | | | | | | | | | | | On second thoughts, hs_free_stable_ptr() is the official way to free a StablePtr. This reverts commit ae583f2949570755c8a03f68a71416c0fd7f257c.
* | | Put the target platform in the settings fileIan Lynagh2011-10-192-32/+0
|/ /
* | Move freeStablePtr() into the exported API (Lennart wants it)Simon Marlow2011-10-181-1/+2
| |
* | make CAFs atomic, to fix #5558Simon Marlow2011-10-171-2/+2
|/ | | | See Note [atomic CAFs] in rts/sm/Storage.c
* Increase the "context stack depth" to 200 (from 20)Simon Peyton Jones2011-09-021-2/+2
| | | | | | | | | | | This parameter controls the allowed depth of reasoning in the type constraint solver. Perfectly well-behaved programs can use deep stacks, and 20 is obviously too small. (Indeed, if you don't have UndecidableInstances, the constraint solver is supposed to terminate, so no limit should be needed.) Responding to Trac #5395 this patch increases the default to 200.
* Snapshot of codegen refactoring to share with simonpjSimon Marlow2011-08-251-4/+4
|
* make shutdownHaskellAndExit() shut down the RTS and exit immediatelySimon Marlow2011-08-121-1/+5
| | | | (#5402)
* ARMv5 compatibility for registerized runtime changes.Stephen Blackheath2011-08-104-26/+46
| | | | | | | When the bootstrap compiler does not include this patch, you must add this line to mk/build.mk, otherwise the ARM architecture cannot be detected due to a -undef option given to the C pre-processor. SRC_HC_OPTS = -pgmP 'gcc -E -traditional'
* RTS: fix xchg/cas fcns to invoke memory barrier on ARMv7 platformKarel Gardas2011-08-101-0/+6
| | | | | | This patch fixes RTS' xchg and cas functions. On ARMv7 it is recommended to add memory barrier after using ldrex/strex for implementing atomic lock or operation.
* implement ARMv7 specific memory barriersKarel Gardas2011-08-101-1/+11
| | | | | | This patch provides implementation of ARMv7 specific memory barriers. It uses dmb sy isn (or shortly dmb) for store/load and load/load barriers and dmb st isn for store/store barrier.
* add support for STG floating-point regs using VFPv3Karel Gardas2011-08-101-2/+44
| | | | | | | This patch adds mapping for STG floating point registers using ARM VFPv3. Since I'm using just d8-d11 also processors with just VFPv3-D16 implemented should work (e.g. NVidia Tegra2, Marvell Dove)
* make StgReturn and cas functions Thumb friendlyKarel Gardas2011-08-101-0/+2
|
* implement ARMv6/7 specific xchg functionKarel Gardas2011-08-101-2/+18
|
* Stephen Blackheath's GHC/ARM registerised portKarel Gardas2011-08-102-0/+64
| | | | | | This is the Stephen Blackheath's GHC/ARM registerised port which is using modified version of LLVM and which provides basic registerised build functionality