summaryrefslogtreecommitdiff
path: root/rts/PrimOps.cmm
Commit message (Collapse)AuthorAgeFilesLines
* Make allocatePinned use local storage, and other refactoringsSimon Marlow2009-12-011-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. - allocateLocal() now allocates large objects into the local nursery, rather than taking a global lock and allocating then in gen 0 step 0. - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). - We had a gen 0 step 0, distinct from the nurseries, which are stored in a separate nurseries[] array. This is slightly strange. I removed the g0s0 global that pointed to gen 0 step 0, and removed all uses of it. I think now we don't use gen 0 step 0 at all, except possibly when there is only one generation. Possibly more tidying up is needed here. - I removed the global allocate() function, and renamed allocateLocal() to allocate(). - the alloc_blocks global is gone. MAYBE_GC() and doYouWantToGC() now check the local nursery only.
* micro-opt: replace stmGetEnclosingTRec() with a field accessSimon Marlow2009-10-141-5/+5
| | | | | While fixing #3578 I noticed that this function was just a field access to StgTRecHeader, so I inlined it manually.
* Add a way to generate tracing events programmaticallySimon Marlow2009-09-251-0/+10
| | | | | | | | | | | | | | added: primop TraceEventOp "traceEvent#" GenPrimOp Addr# -> State# s -> State# s { Emits an event via the RTS tracing framework. The contents of the event is the zero-terminated byte string passed as the first argument. The event will be emitted either to the .eventlog file, or to stderr, depending on the runtime RTS flags. } and added the required RTS functionality to support it. Also a bit of refactoring in the RTS tracing code.
* Fix #3429: a tricky race conditionSimon Marlow2009-08-181-4/+4
| | | | | | | | | | | | | | | | | | There were two bugs, and had it not been for the first one we would not have noticed the second one, so this is quite fortunate. The first bug is in stg_unblockAsyncExceptionszh_ret, when we found a pending exception to raise, but don't end up raising it, there was a missing adjustment to the stack pointer. The second bug was that this case was actually happening at all: it ought to be incredibly rare, because the pending exception thread would have to be killed between us finding it and attempting to raise the exception. This made me suspicious. It turned out that there was a race condition on the tso->flags field; multiple threads were updating this bitmask field non-atomically (one of the bits is the dirty-bit for the generational GC). The fix is to move the dirty bit into its own field of the TSO, making the TSO one word larger (sadly).
* Rename primops from foozh_fast to stg_foozhSimon Marlow2009-08-031-84/+84
| | | | For consistency with other RTS exported symbols
* propagate the result of atomically properly (fixes #3049)Simon Marlow2009-06-241-4/+8
|
* Remove the implementation of gmp primops from the rtsDuncan Coutts2009-06-131-514/+1
|
* Convert the gmp cmm primops to use local stack allocationDuncan Coutts2009-06-101-59/+56
| | | | | | Using global temp vars is really ugly and in the threaded case it needs slots in the StgRegTable. It'd also be pretty silly once we move the cmm primops out of the rts, into the integer-gmp package.
* Remove the unused remains of __decodeFloatIan Lynagh2009-06-021-26/+0
|
* fix cut-and-pasto in mkWeakForeignEnv#, causing random segfaultsSimon Marlow2009-05-151-1/+1
|
* Instead of a separate context-switch flag, set HpLim to zeroSimon Marlow2009-03-131-2/+4
| | | | | | | | | | | | This reduces the latency between a context-switch being triggered and the thread returning to the scheduler, which in turn should reduce the cost of the GC barrier when there are many cores. We still retain the old context_switch flag which is checked at the end of each block of allocation. The idea is that setting HpLim may fail if the the target thread is modifying HpLim at the same time; the context_switch flag is a fallback. It also allows us to "context switch soon" without forcing an immediate switch, which can be costly.
* Allocate the right number of words in new*PinnedByteArrayzh_fastIan Lynagh2009-03-111-22/+31
|
* Partial fix for #2917Simon Marlow2009-03-061-11/+35
| | | | | | | | | | | | | | | | | | | | | | | - add newAlignedPinnedByteArray# for allocating pinned BAs with arbitrary alignment - the old newPinnedByteArray# now aligns to 16 bytes Foreign.alloca will use newAlignedPinnedByteArray#, and so might end up wasting less space than before (we used to align to 8 by default). Foreign.allocaBytes and Foreign.mallocForeignPtrBytes will get 16-byte aligned memory, which is enough to avoid problems with SSE instructions on x86, for example. There was a bug in the old newPinnedByteArray#: it aligned to 8 bytes, but would have failed if the header was not a multiple of 8 (fortunately it always was, even with profiling). Also we occasionally wasted some space unnecessarily due to alignment in allocatePinned(). I haven't done anything about Foreign.malloc/mallocBytes, which will give you the same alignment guarantees as malloc() (8 bytes on Linux/x86 here).
* newPinnedByteArray#: align the result to 16-bytes (part of #2917)Simon Marlow2009-02-191-4/+11
|
* newPinnedByteArray#: align the result to 16-bytes (part of #2917)Simon Marlow2009-02-191-11/+4
|
* Implement #2191 (traceCcs# -- prints CCS of a value when available -- take 3)Samuel Bronson2009-01-271-0/+20
| | | | | | In this version, I untag R1 before using it, and even enter R2 at the end rather than simply returning it (which didn't work right when R2 was a thunk).
* putMVar and takeMVar: add write_barrier() to fix race with throwToSimon Marlow2009-01-071-2/+8
|
* FIX #1364: added support for C finalizers that run as soon as the value is ↵Simon Marlow2008-12-101-5/+72
| | | | | | | | | | | not longer reachable. Patch originally by Ivan Tomac <tomac@pacific.net.au>, amended by Simon Marlow: - mkWeakFinalizer# commoned up with mkWeakFinalizerEnv# - GC parameters to ALLOC_PRIM fixed
* Merging in the new codegen branchdias@eecs.harvard.edu2008-08-141-4/+4
| | | | | | | | | | | | | | | | | | This merge does not turn on the new codegen (which only compiles a select few programs at this point), but it does introduce some changes to the old code generator. The high bits: 1. The Rep Swamp patch is finally here. The highlight is that the representation of types at the machine level has changed. Consequently, this patch contains updates across several back ends. 2. The new Stg -> Cmm path is here, although it appears to have a fair number of bugs lurking. 3. Many improvements along the CmmCPSZ path, including: o stack layout o some code for infotables, half of which is right and half wrong o proc-point splitting
* fix via-C compilation: import ghczmprim_GHCziBool_False_closureSimon Marlow2008-11-071-0/+1
|
* re-instate counting of sparks convertedSimon Marlow2008-11-061-8/+2
| | | | lost in patch "Run sparks in batches"
* Run sparks in batches, instead of creating a new thread for each oneSimon Marlow2008-11-061-0/+22
| | | | | Signficantly reduces the overhead for par, which means that we can make use of paralellism at a much finer granularity.
* add readTVarIO :: TVar a -> IO aSimon Marlow2008-10-101-0/+11
|
* atomicModifyIORef: use a local cas() instead of the global lockSimon Marlow2008-10-081-13/+16
| | | | This should improve scaling when using atomicModifyIORef
* Move the context_switch flag into the CapabilitySimon Marlow2008-09-191-2/+2
| | | | | Fixes a long-standing bug that could in some cases cause sub-optimal scheduling behaviour.
* get exception names from Control.Exception.Base instead of Control.ExceptionRoss Paterson2008-08-121-2/+2
|
* Follow extensible exception changesIan Lynagh2008-07-301-2/+2
|
* Change the calling conventions for unboxed tuples slightlySimon Marlow2008-07-281-30/+0
| | | | | | | | | | When returning an unboxed tuple with a single non-void component, we now use the same calling convention as for returning a value of the same type as that component. This means that the return convention for IO now doesn't vary depending on the platform, which make some parts of the RTS simpler, and fixes a problem I was having with making the FFI work in unregisterised GHCi (the byte-code compiler makes some assumptions about calling conventions to keep things simple).
* add threadStatus# primop, for querying the status of a ThreadId#Simon Marlow2008-07-101-0/+33
|
* oops, fix more register clobberage Simon Marlow2008-07-101-2/+2
| | | | fixes crash with -threaded -debug for me
* Fix some random register clobbering in takeMVar/putMVarSimon Marlow2008-07-091-2/+6
| | | | This showed up as a crash in conc032 for me.
* 64-bit fixesSimon Marlow2008-06-171-4/+4
|
* fix some types for 64-bit platformsSimon Marlow2008-06-031-4/+4
|
* add [] to foreign callsSimon Marlow2008-04-161-11/+13
|
* remove GRAN/PAR codeSimon Marlow2008-04-161-24/+0
|
* Add a write barrier to the TSO link field (#1589)Simon Marlow2008-04-161-19/+32
|
* Fix conversions between Double/Float and simple-integerIan Lynagh2008-06-141-5/+9
|
* Fix a division-by-zero when +RTS -V0 is givenIan Lynagh2008-04-261-1/+5
| | | | In delayzh_fast we act as if tickInterval was 50, not 0.
* Fix int64ToInteger 0xFFFFFFFF00000000 on 32bit machine; trac #2223Ian Lynagh2008-04-241-3/+3
| | | | Patch from Mike Gunter.
* Add some more generic (en|de)code(Double|Float) codeIan Lynagh2008-04-171-0/+35
|
* Do not #include external header files when compiling via CSimon Marlow2008-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This has several advantages: - -fvia-C is consistent with -fasm with respect to FFI declarations: both bind to the ABI, not the API. - foreign calls can now be inlined freely across module boundaries, since a header file is not required when compiling the call. - bootstrapping via C will be more reliable, because this difference in behavour between the two backends has been removed. There is one disadvantage: - we get no checking by the C compiler that the FFI declaration is correct. So now, the c-includes field in a .cabal file is always ignored by GHC, as are header files specified in an FFI declaration. This was previously the case only for -fasm compilations, now it is also the case for -fvia-C too.
* Link libgmp.a statically into libHSrts.dll on WindowsClemens Fruhwirth2008-01-011-0/+2
|
* forkIO starts the new thread blocked if the parent is blocked (#1048)Simon Marlow2007-12-041-0/+12
|
* fix warnings when compiling via CSimon Marlow2007-10-181-4/+4
|
* Add a proper write barrier for MVarsSimon Marlow2007-10-111-20/+45
| | | | | | | | | | | | Previously MVars were always on the mutable list of the old generation, which meant every MVar was visited during every minor GC. With lots of MVars hanging around, this gets expensive. We addressed this problem for MUT_VARs (aka IORefs) a while ago, the solution is to use a traditional GC write-barrier when the object is modified. This patch does the same thing for MVars. TVars are still done the old way, they could probably benefit from the same treatment too.
* {Enter,Leave}CriticalSection imports should be outside #ifdef __PIC__Simon Marlow2007-09-051-1/+1
|
* FIX: Correct Leave/EnterCriticalSection importsManuel M T Chakravarty2007-09-051-2/+2
|
* put the @N suffix on stdcall foreign calls in .cmm codeSimon Marlow2007-09-041-0/+2
| | | | This applies to EnterCriticalSection and LeaveCriticalSection in the RTS
* Windows: remove the {Enter,Leave}CricialSection wrappersSimon Marlow2007-08-291-2/+2
| | | | | | The C-- parser was missing the "stdcall" calling convention for foreign calls, but once added we can call {Enter,Leave}CricialSection directly.
* annotate C-- calls that do not returnNorman Ramsey2007-08-201-6/+6
| | | | | | | | | * The correct definition of C-- requires that a procedure not 'fall off the end'. The 'never returns' annotation tells us if a (foreign) call is not going to return. Validated!