| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
DEBUG imposes a significant performance hit in the GC, yet we often
want some of the debugging output, so -vg gives us the cheap trace
messages without the sanity checking of DEBUG, just like -vs for the
scheduler.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
When a stack is occupying less than 1/4 of the memory it owns, and is
larger than a megablock, we release half of it. Shrinking is O(1), it
doesn't need to copy the stack.
|
| |
|
| |
|
|
|
|
| |
in addition to checking for leaks
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
- major (multithreaded) GC is measured separately from minor GC
- events to measure can now be specified on the command line, e.g
prog +RTS -a+PAPI_TOT_CYC
|
|
|
|
|
|
|
|
|
| |
eg. use +RTS -g2 -RTS for 2 threads. Only major GCs are parallelised,
minor GCs are still sequential. Don't use more threads than you
have CPUs.
It works most of the time, although you won't see much speedup yet.
Tuning and more work on stability still required.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch localises the state of the GC into a gc_thread structure,
and reorganises the inner loop of the GC to scavenge one block at a
time from global work lists in each "step". The gc_thread structure
has a "workspace" for each step, in which it collects evacuated
objects until it has a full block to push out to the step's global
list. Details of the algorithm will be on the wiki in due course.
At the moment, THREADED_RTS does not compile, but the single-threaded
GC works (and is 10-20% slower than before).
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Include TickyCounters.h in Stg.h if we are doing Ticky Ticky.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has several advantages:
- -fvia-C is consistent with -fasm with respect to FFI declarations:
both bind to the ABI, not the API.
- foreign calls can now be inlined freely across module boundaries, since
a header file is not required when compiling the call.
- bootstrapping via C will be more reliable, because this difference
in behavour between the two backends has been removed.
There is one disadvantage:
- we get no checking by the C compiler that the FFI declaration
is correct.
So now, the c-includes field in a .cabal file is always ignored by
GHC, as are header files specified in an FFI declaration. This was
previously the case only for -fasm compilations, now it is also the
case for -fvia-C too.
|
| |
|
|
|
|
| |
For some reason this causes build failures for me in my 32-bit chroot,
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
File locking (of the Haskell 98 variety) was previously done using a
static table with linear search, which had two problems: the array had
a fixed size and was sometimes too small (#1109), and performance of
lockFile/unlockFile was suboptimal due to the linear search.
Also the algorithm failed to count readers as required by Haskell 98
(#629).
Now it's done using a hash table (provided by the RTS). Furthermore I
avoided the extra fstat() for every open file by passing the dev_t and
ino_t into lockFile. This and the improvements to the locking
algorithm result in a healthy 20% or so performance increase for
opening/closing files (see openFile008 test).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hs_hpc_module() prototype in RtsExternal.h didn't match its usage:
we were passing StgWord-sized parameters but the prototype used C
ints. I think it accidentally worked because we only ever passed
constants that got promoted. The constants unfortunately were
sometimes negative, which caused the C compiler to emit warnings.
I suspect PprC.pprHexVal may be wrong to emit negative constants in
the generated C, but I'm not completely sure. Anyway, it's easy to
fix this in CgHpc, which is what I've done.
|
|
|
|
| |
Fixes some gratuitous warnings when compiling via C with -fhpc
|
|
|
|
|
|
|
| |
For some reason the C-- version of recordMutable wasn't verifying that
the object was in an old generation before attempting to add it to the
mutable list, and this broke maessen_hashtab. This version of
recordMutable is only used in unsafeThaw#.
|
|
|
|
|
|
|
|
| |
Now allocate() is a synonym for allocateInGen().
I also made various cleanups: there is now less special-case code for
supporting -G1 (two-space collection), and -G1 now works with
-threaded.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously MVars were always on the mutable list of the old
generation, which meant every MVar was visited during every minor GC.
With lots of MVars hanging around, this gets expensive. We addressed
this problem for MUT_VARs (aka IORefs) a while ago, the solution is to
use a traditional GC write-barrier when the object is modified. This
patch does the same thing for MVars.
TVars are still done the old way, they could probably benefit from the
same treatment too.
|
| |
|
|
|
|
|
|
|
|
|
| |
The extra safe points introduced for breakpoints were previously
compiled as normal updatable thunks, but they are guaranteed
single-entry, so we can use non-updatable thunks here. This restores
the tail-call property where it was lost in some cases (although stack
squeezing probably often recovered it), and should improve
performance.
|
|
|
|
|
| |
addDLL returns const char*, not just a char*.
Fix compiler warning
|
| |
|
|
|
|
|
|
|
|
|
|
| |
An AP_STACK now ensures that there is at least AP_STACK_SPLIM words of
stack headroom available after unpacking the payload. Continuations
that require more than AP_STACK_SPLIM words of stack must do their own
stack checks instead of aggregating their stack usage into the parent
frame. I have made this change for the interpreter, but not for
compiled code yet - we should do this in the glorious rewrite of the
code generator.
|
|
|
|
|
|
| |
The C-- parser was missing the "stdcall" calling convention for
foreign calls, but once added we can call {Enter,Leave}CricialSection
directly.
|
| |
|
| |
|
|
|
|
|
| |
When calling EnterCriticalSection and LeaveCriticalSection from C--
code, we go via wrappers which use ccall (rather than stdcall).
|