| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parallel GC was using setContextSwitches() to stop all the other
threads, which sets the context_switch flag on every Capability. That
had the side effect of causing every Capability to also switch
threads, and since GCs can be much more frequent than context
switches, this increased the context switch frequency. When context
switches are expensive (because the switch is between two bound
threads or a bound and unbound thread), the difference is quite
noticeable.
The fix is to have a separate flag to indicate that a Capability
should stop and return to the scheduler, but not switch threads. I've
called this the "interrupt" flag.
|
| |
|
|
|
|
| |
Spotted by gdb's malloc debugger while I was looking for something else.
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
| |
|
|
|
|
|
| |
Returns a pointer to the current cost-centre stack when profiling,
NULL otherwise.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means that both time and heap profiling work for parallel
programs. Main internal changes:
- CCCS is no longer a global variable; it is now another
pseudo-register in the StgRegTable struct. Thus every
Capability has its own CCCS.
- There is a new built-in CCS called "IDLE", which records ticks for
Capabilities in the idle state. If you profile a single-threaded
program with +RTS -N2, you'll see about 50% of time in "IDLE".
- There is appropriate locking in rts/Profiling.c to protect the
shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS. Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Terminology cleanup: the type "Ticks" has been renamed "Time", which
is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds).
The terminology "tick" is now used consistently to mean the interval
between timer signals.
The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if
we have it). Before it used CPU time in the non-threaded RTS and
realtime in the threaded RTS, but I've discovered that the CPU timer
has terrible resolution (at least on Linux) and isn't much use for
profiling. So now we always use realtime. This should also fix
The default tick interval is now 10ms, except when profiling where we
drop it to 1ms. This gives more accurate profiles without affecting
runtime too much (<1%).
Lots of cleanups - the resolution of Time is now in one place
only (Rts.h) rather than having calculations that depend on the
resolution scattered all over the RTS. I hope I found them all.
|
|
|
|
|
| |
Based on a patch from Arnaud Degroote <degroote@NetBSD.org> in
trac #5480.
|
|
|
|
| |
Was causing occasional failure in some threaded2 tests.
|
|
|
|
| |
Fixes ticket #5472
|
|
|
|
| |
hppa1, m68k
|
| |
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
LLVM does not support the __thread attribute for thread
local storage and may generate incorrect code for global
register variables. We want to allow building the runtime with
LLVM-based compilers such as llvm-gcc and clang,
particularly for MacOS.
This patch changes the gct variable used by the garbage
collector to use pthread_getspecific() for thread local
storage when an llvm based compiler is used to build the
runtime.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
We now manage the stack correctly on both x86 and i386, keeping
the stack align at (16n bytes - word size) on function entry
and at (16n bytes) on function calls. This gives us compatability
with LLVM and GCC.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rather than have main() be statically compiled as part of the RTS, we
now generate it into the tiny C file that we compile when linking a
binary.
The main motivation is that we want to pass the settings for the
-rtsotps and -with-rtsopts flags into the RTS, rather than relying on
fragile linking semantics to override the defaults, which don't work
with DLLs on Windows (#5373). In order to do this, we need to extend
the API for initialising the RTS, so now we have:
void hs_init_ghc (int *argc, char **argv[], // program arguments
RtsConfig rts_config); // RTS configuration
hs_init_ghc() can optionally be used instead of hs_init(), and allows
passing in configuration options for the RTS. RtsConfig is a struct,
which currently has two fields:
typedef struct {
RtsOptsEnabledEnum rts_opts_enabled;
const char *rts_opts;
} RtsConfig;
but might have more in the future. There is a default value for the
struct, defaultRtsConfig, the idea being that you start with this and
override individual fields as necessary.
In fact, main() was in a separate static library, libHSrtsmain.a.
That's now gone.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
(fix build failure with -split-objs)
|
| |
| |
| |
| |
| | |
Also, use the Win32 API (CreateThread) instead of the CRT API
(_beginthreadex) for thread creation.
|
| |
| |
| |
| |
| |
| |
| | |
Ensures that these handles are flushed even when the RTS is being used
as a library, with no main.
Needs a corresponding change to libraries/base.
|
| | |
|
| |
| |
| |
| |
| |
| | |
The existing GHC.Conc.labelThread will now also emit the the thread
label into the eventlog. Profiling tools like ThreadScope could then
use the thread labels rather than thread numbers.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
User visible changes
====================
Profilng
--------
Flags renamed (the old ones are still accepted for now):
OLD NEW
--------- ------------
-auto-all -fprof-auto
-auto -fprof-exported
-caf-all -fprof-cafs
New flags:
-fprof-auto Annotates all bindings (not just top-level
ones) with SCCs
-fprof-top Annotates just top-level bindings with SCCs
-fprof-exported Annotates just exported bindings with SCCs
-fprof-no-count-entries Do not maintain entry counts when profiling
(can make profiled code go faster; useful with
heap profiling where entry counts are not used)
Cost-centre stacks have a new semantics, which should in most cases
result in more useful and intuitive profiles. If you find this not to
be the case, please let me know. This is the area where I have been
experimenting most, and the current solution is probably not the
final version, however it does address all the outstanding bugs and
seems to be better than GHC 7.2.
Stack traces
------------
+RTS -xc now gives more information. If the exception originates from
a CAF (as is common, because GHC tends to lift exceptions out to the
top-level), then the RTS walks up the stack and reports the stack in
the enclosing update frame(s).
Result: +RTS -xc is much more useful now - but you still have to
compile for profiling to get it. I've played around a little with
adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem
quite accurately.
I plan to add more facilities for stack tracing (e.g. in GHCi) in the
future.
Coverage (HPC)
--------------
* derived instances are now coloured yellow if they weren't used
* likewise record field names
* entry counts are more accurate (hpc --fun-entry-count)
* tab width is now correct (markup was previously off in source with
tabs)
Internal changes
================
In Core, the Note constructor has been replaced by
Tick (Tickish b) (Expr b)
which is used to represent all the kinds of source annotation we
support: profiling SCCs, HPC ticks, and GHCi breakpoints.
Depending on the properties of the Tickish, different transformations
apply to Tick. See CoreUtils.mkTick for details.
Tickets
=======
This commit closes the following tickets, test cases to follow:
- Close #2552: not a bug, but the behaviour is now more intuitive
(test is T2552)
- Close #680 (test is T680)
- Close #1531 (test is result001)
- Close #949 (test is T949)
- Close #2466: test case has bitrotted (doesn't compile against current
version of vector-space package)
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch changes the STG code so that %rsp to be aligned
to a 16-byte boundary + 8. This is the alignment required by
the x86_64 ABI on entry to a function. Previously we kept
%rsp aligned to a 16-byte boundary, but this was causing
problems for the LLVM backend (see #4211).
We now don't need to invoke llvm stack mangler on
x86_64 targets. Since the stack is now 16+8 byte algined in
STG land on x86_64, we don't need to mangle the stack
manipulations with the llvm mangler.
This patch only modifies the alignement for x86_64 backends.
Signed-off-by: David Terei <davidterei@gmail.com>
|
| | |
|
| |
| |
| |
| |
| | |
I naively assumed that mingw would not have unistd.h or sys/types
but it has both, yet does not have getuid() and friends.
|
| |
| |
| |
| |
| |
| |
| | |
Without any <file> arg, these flags just dump info to stderr so
are at most a mild information disclosure danger. We disallow
a <file> arg in the default -rtsopts=some mode since that will
overwrite the given file.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Ticket #3910 originally pointed out that the RTS options are a potential
security problem. For example the -t -s or -S flags can be used to
overwrite files. This would be bad in the context of CGI scripts or
setuid binaries. So we introduced a system where +RTS processing is more
or less disabled unless you pass the -rtsopts flag at link time.
This scheme is safe enough but it also really annoies users. They have
to use -rtsopts in many circumstances: with -threaded to use -N, with
-eventlog to use -l, with -prof to use any of the profiling flags. Many
users just set -rtsopts globally or in project .cabal files. Apart from
annoying users it reduces security because it means that deployed
binaries will have all RTS options enabled rather than just profiling
ones.
This patch relaxes the set of RTS options that are available in the
default -rtsopts=some case. For "deployment" ways like vanilla and
-threaded we remain quite conservative. Only --info -? --help are
allowed for vanilla. For -threaded, -N and -N<x> are allowed with a
check that x <= num cpus.
For "developer" ways like -debug, -eventlog, -prof, we allow all the
options that are special to that way. Some of these allow writing files,
but the file written is not directly under the control of the attacker.
For the setuid case (where the attacker would have control over binary
name, current dir, local symlinks etc) we check if the process is
running setuid/setgid and refuse all RTS option processing. Users would
need to use -rtsopts=all in this case.
We are making the assumption that developers will not deploy binaries
built in the -debug, -eventlog, -prof ways. And even if they do, the
damage should be limited to DOS, information disclosure and writing
files like <progname>.eventlog, not arbitrary files.
|
| |
| |
| |
| |
| | |
Otherwise we can use +RTS -N-1 and get 2^32 or 2^64 capabilities
which doesn't work out so well...
|
| | |
|