| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a batch of refactoring to remove some of the GC's global
state, as we move towards CPU-local GC.
- allocateLocal() now allocates large objects into the local
nursery, rather than taking a global lock and allocating
then in gen 0 step 0.
- allocatePinned() was still allocating from global storage and
taking a lock each time, now it uses local storage.
(mallocForeignPtrBytes should be faster with -threaded).
- We had a gen 0 step 0, distinct from the nurseries, which are
stored in a separate nurseries[] array. This is slightly strange.
I removed the g0s0 global that pointed to gen 0 step 0, and
removed all uses of it. I think now we don't use gen 0 step 0 at
all, except possibly when there is only one generation. Possibly
more tidying up is needed here.
- I removed the global allocate() function, and renamed
allocateLocal() to allocate().
- the alloc_blocks global is gone. MAYBE_GC() and
doYouWantToGC() now check the local nursery only.
|
|
|
|
|
| |
While fixing #3578 I noticed that this function was just a field
access to StgTRecHeader, so I inlined it manually.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
added:
primop TraceEventOp "traceEvent#" GenPrimOp
Addr# -> State# s -> State# s
{ Emits an event via the RTS tracing framework. The contents
of the event is the zero-terminated byte string passed as the first
argument. The event will be emitted either to the .eventlog file,
or to stderr, depending on the runtime RTS flags. }
and added the required RTS functionality to support it. Also a bit of
refactoring in the RTS tracing code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were two bugs, and had it not been for the first one we would
not have noticed the second one, so this is quite fortunate.
The first bug is in stg_unblockAsyncExceptionszh_ret, when we found a
pending exception to raise, but don't end up raising it, there was a
missing adjustment to the stack pointer.
The second bug was that this case was actually happening at all: it
ought to be incredibly rare, because the pending exception thread
would have to be killed between us finding it and attempting to raise
the exception. This made me suspicious. It turned out that there was
a race condition on the tso->flags field; multiple threads were
updating this bitmask field non-atomically (one of the bits is the
dirty-bit for the generational GC). The fix is to move the dirty bit
into its own field of the TSO, making the TSO one word larger (sadly).
|
|
|
|
| |
For consistency with other RTS exported symbols
|
| |
|
| |
|
|
|
|
|
|
| |
Using global temp vars is really ugly and in the threaded case it
needs slots in the StgRegTable. It'd also be pretty silly once we
move the cmm primops out of the rts, into the integer-gmp package.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces the latency between a context-switch being triggered and
the thread returning to the scheduler, which in turn should reduce the
cost of the GC barrier when there are many cores.
We still retain the old context_switch flag which is checked at the
end of each block of allocation. The idea is that setting HpLim may
fail if the the target thread is modifying HpLim at the same time; the
context_switch flag is a fallback. It also allows us to "context
switch soon" without forcing an immediate switch, which can be costly.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add newAlignedPinnedByteArray# for allocating pinned BAs with
arbitrary alignment
- the old newPinnedByteArray# now aligns to 16 bytes
Foreign.alloca will use newAlignedPinnedByteArray#, and so might end
up wasting less space than before (we used to align to 8 by default).
Foreign.allocaBytes and Foreign.mallocForeignPtrBytes will get 16-byte
aligned memory, which is enough to avoid problems with SSE
instructions on x86, for example.
There was a bug in the old newPinnedByteArray#: it aligned to 8 bytes,
but would have failed if the header was not a multiple of 8
(fortunately it always was, even with profiling). Also we
occasionally wasted some space unnecessarily due to alignment in
allocatePinned().
I haven't done anything about Foreign.malloc/mallocBytes, which will
give you the same alignment guarantees as malloc() (8 bytes on
Linux/x86 here).
|
| |
|
| |
|
|
|
|
|
|
| |
In this version, I untag R1 before using it, and even enter R2 at the
end rather than simply returning it (which didn't work right when R2
was a thunk).
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
not longer reachable.
Patch originally by Ivan Tomac <tomac@pacific.net.au>, amended by
Simon Marlow:
- mkWeakFinalizer# commoned up with mkWeakFinalizerEnv#
- GC parameters to ALLOC_PRIM fixed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This merge does not turn on the new codegen (which only compiles
a select few programs at this point),
but it does introduce some changes to the old code generator.
The high bits:
1. The Rep Swamp patch is finally here.
The highlight is that the representation of types at the
machine level has changed.
Consequently, this patch contains updates across several back ends.
2. The new Stg -> Cmm path is here, although it appears to have a
fair number of bugs lurking.
3. Many improvements along the CmmCPSZ path, including:
o stack layout
o some code for infotables, half of which is right and half wrong
o proc-point splitting
|
| |
|
|
|
|
| |
lost in patch "Run sparks in batches"
|
|
|
|
|
| |
Signficantly reduces the overhead for par, which means that we can
make use of paralellism at a much finer granularity.
|
| |
|
|
|
|
| |
This should improve scaling when using atomicModifyIORef
|
|
|
|
|
| |
Fixes a long-standing bug that could in some cases cause sub-optimal
scheduling behaviour.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When returning an unboxed tuple with a single non-void component, we
now use the same calling convention as for returning a value of the
same type as that component. This means that the return convention
for IO now doesn't vary depending on the platform, which make some
parts of the RTS simpler, and fixes a problem I was having with making
the FFI work in unregisterised GHCi (the byte-code compiler makes
some assumptions about calling conventions to keep things simple).
|
| |
|
|
|
|
| |
fixes crash with -threaded -debug for me
|
|
|
|
| |
This showed up as a crash in conc032 for me.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
In delayzh_fast we act as if tickInterval was 50, not 0.
|
|
|
|
| |
Patch from Mike Gunter.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has several advantages:
- -fvia-C is consistent with -fasm with respect to FFI declarations:
both bind to the ABI, not the API.
- foreign calls can now be inlined freely across module boundaries, since
a header file is not required when compiling the call.
- bootstrapping via C will be more reliable, because this difference
in behavour between the two backends has been removed.
There is one disadvantage:
- we get no checking by the C compiler that the FFI declaration
is correct.
So now, the c-includes field in a .cabal file is always ignored by
GHC, as are header files specified in an FFI declaration. This was
previously the case only for -fasm compilations, now it is also the
case for -fvia-C too.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously MVars were always on the mutable list of the old
generation, which meant every MVar was visited during every minor GC.
With lots of MVars hanging around, this gets expensive. We addressed
this problem for MUT_VARs (aka IORefs) a while ago, the solution is to
use a traditional GC write-barrier when the object is modified. This
patch does the same thing for MVars.
TVars are still done the old way, they could probably benefit from the
same treatment too.
|
| |
|
| |
|
|
|
|
| |
This applies to EnterCriticalSection and LeaveCriticalSection in the RTS
|
|
|
|
|
|
| |
The C-- parser was missing the "stdcall" calling convention for
foreign calls, but once added we can call {Enter,Leave}CricialSection
directly.
|
|
|
|
|
|
|
|
|
| |
* The correct definition of C-- requires that a procedure not
'fall off the end'. The 'never returns' annotation tells us
if a (foreign) call is not going to return.
Validated!
|