summaryrefslogtreecommitdiff
path: root/rts/PrimOps.cmm
Commit message (Collapse)AuthorAgeFilesLines
* CMM: add a mechanism to import C .data labelsSergei Trofimovich2015-01-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This introduces new .cmm syntax for import: 'import' 'CLOSURE' <identifier>; Currently cmm syntax allows importing only function labels: import pthread_mutex_lock; but sometimes ghc needs to import global gariables or haskell closures: import ghczmprim_GHCziTypes_True_closure; import base_ControlziExceptionziBase_nestedAtomically_closure; import ghczmprim_GHCziTypes_False_closure; import sm_mutex; It breaks on ia64 where there is a difference in pointers to data and pointer to functions. Patch fixes threaded runtime on ia64 where dereference of 'sm_mutex' from CMM led to incurrect location. Exact breakage machanics are the same as in e18525fae273f4c1ad8d6cbe1dea4fc074cac721 Merge into the 7.10 branch Signed-off-by: Sergei Trofimovich <siarheit@google.com> Test Plan: passes ./validate, makes ghci work on ghc-7.8.4 Reviewers: simonmar, simonpj, austin Reviewed By: austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D622
* Revert "rts/PrimOps.cmm: follow '_static_closure' update"Sergei Trofimovich2014-10-211-1/+1
| | | | | | This reverts commit eb191ab6c85f4b668a6e9151dcecaf1f1e7ec7c2. Follows revert of STATIC_CLOSURE and restores UNREG build.
* Revert "Rename _closure to _static_closure, apply naming consistently."Edward Z. Yang2014-10-201-4/+4
| | | | | | | This reverts commit 35672072b4091d6f0031417bc160c568f22d0469. Conflicts: compiler/main/DriverPipeline.hs
* rts/PrimOps.cmm: follow '_static_closure' updateSergei Trofimovich2014-10-021-1/+1
| | | | | | | | | | | | | | Caught by UNREG build failure: rts_dist_HC rts/dist/build/PrimOps.o /tmp/ghc8613_0/ghc8613_2.hc: In function 'cf8_entry': /tmp/ghc8613_0/ghc8613_2.hc:1942:13: error: 'base_ControlziExceptionziBase_nestedAtomically_static_closure' undeclared (first use in this function) R1.w = (W_)&base_ControlziExceptionziBase_nestedAtomically_static_closure; ^ Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Rename _closure to _static_closure, apply naming consistently.Edward Z. Yang2014-10-011-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
* `M-x delete-trailing-whitespace` & `M-x untabify`Herbert Valerio Riedel2014-09-241-2/+2
|
* Implement `decodeDouble_Int64#` primopHerbert Valerio Riedel2014-09-171-0/+17
| | | | | | | | | | | | | | | The existing `decodeDouble_2Int#` primop is rather inconvenient to use (and in fact is not even used by `integer-gmp`) as the mantissa is split into 3 components which would actually fit in an `Int64#` value. However, `decodeDouble_Int64#` is to be used by the new `integer-gmp2` re-implementation (see #9281). Moreover, `decodeDouble_2Int#` performs direct bit-wise operations on the IEEE representation which can be replaced by a combination of the portable standard C99 `scalbn(3)` and `frexp(3)` functions. Differential Revision: https://phabricator.haskell.org/D160
* Revert "Fix typos 'resizze'"Gabor Greif2014-08-161-1/+1
| | | | | | this is z-encoding (as hvr tells me) This reverts commit 425d5178af55620efa00e6e16426f491c63ad533.
* Fix typos 'resizze'Gabor Greif2014-08-161-1/+1
|
* Implement {resize,shrink}MutableByteArray# primopsHerbert Valerio Riedel2014-08-161-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two new primops with the type-signatures resizeMutableByteArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, MutableByteArray# s #) shrinkMutableByteArray# :: MutableByteArray# s -> Int# -> State# s -> State# s allow to resize MutableByteArray#s in-place (when possible), and are useful for algorithms where memory is temporarily over-allocated. The motivating use-case is for implementing integer backends, where the final target size of the result is either N or N+1, and only known after the operation has been performed. A future commit will implement a stateful variant of the `sizeofMutableByteArray#` operation (see #9447 for details), since now the size of a `MutableByteArray#` may change over its lifetime (i.e before it gets frozen or GCed). Test Plan: ./validate --slow Reviewers: ezyang, austin, simonmar Reviewed By: austin, simonmar Differential Revision: https://phabricator.haskell.org/D133
* Re-add more primops for atomic ops on byte arraysJohan Tibell2014-06-301-12/+0
| | | | | | | | | | | | | | | | | | | | | | | This is the second attempt to add this functionality. The first attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due to register allocator failure on x86. Given how the register allocator currently works, we don't have enough registers on x86 to support cmpxchg using complicated addressing modes. Instead we fall back to a simpler addressing mode on x86. Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Revert "Add more primops for atomic ops on byte arrays"Johan Tibell2014-06-261-0/+12
| | | | | | | | This commit caused the register allocator to fail on i386. This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and 04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to the first).
* Add more primops for atomic ops on byte arraysJohan Tibell2014-06-241-12/+0
| | | | | | | | | | | | | | | | | | | Summary: Add more primops for atomic ops on byte arrays Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Fix missing unlockClosure() call in tryReadMVar (#9148)Simon Marlow2014-05-301-0/+1
|
* Per-capability nursery weak pointer lists, fixes #9075Edward Z. Yang2014-05-291-4/+5
| | | | Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
* PrimOps.cmm: whitespace onlyJohan Tibell2014-03-291-438/+439
| | | | | Harmonize the indentation amount. The file mixed 4, 2, and in some cases 3 spaces for indentation.
* Add SmallArray# and SmallMutableArray# typesJohan Tibell2014-03-291-0/+118
| | | | | | | | | | | | | | | These array types are smaller than Array# and MutableArray# and are faster when the array size is small, as they don't have the overhead of a card table. Having no card table reduces the closure size with 2 words in the typical small array case and leads to less work when updating or GC:ing the array. Reduces both the runtime and memory allocation by 8.8% on my insert benchmark for the HashMap type in the unordered-containers package, which makes use of lots of small arrays. With tuned GC settings (i.e. `+RTS -A6M`) the runtime reduction is 15%. Fixes #8923.
* Make copy array ops out-of-line by defaultJohan Tibell2014-03-281-0/+20
| | | | | | This should reduce code size when there's little to gain from inlining these primops, while still retaining the inlining benefit when the size of the copy is known statically.
* codeGen: inline allocation optimization for clone array primopsJohan Tibell2014-03-221-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | The inline allocation version is 69% faster than the out-of-line version, when cloning an array of 16 unit elements on a 64-bit machine. Comparing the new and the old primop implementations isn't straightforward. The old version had a missing heap check that I discovered during the development of the new version. Comparing the old and the new version would requiring fixing the old version, which in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim. The inline allocation threshold is configurable via -fmax-inline-alloc-size which gives the maximum array size, in bytes, to allocate inline. The size does not include the closure header size. Allowing the same primop to be either inline or out-of-line has some implication for how we lay out heap checks. We always place a heap check around out-of-line primops, as they may allocate outside of our knowledge. However, for the inline primops we only allow allocation via the standard means (i.e. virtHp). Since the clone primops might be either inline or out-of-line the heap check layout code now consults shouldInlinePrimOp to know whether a primop will be inlined.
* Don't use gcptr for interior pointersJohan Tibell2014-03-201-10/+8
| | | | | | gcptr should only be used for pointers that the GC should follow. While this didn't cause any bugs right now, since these variables aren't live over a GC, it's clearer to use the right type.
* Fix two issues in stg_newArrayzhJohan Tibell2014-03-131-18/+4
| | | | | | | | | | | | | | | | | | | | | | The implementations of newArray# and newArrayArray#, stg_newArrayzh and stg_newArrayArrayzh, had three issues: * The condition for the loop that fills the array with the initial element was incorrect. It would write into the card table as well. The condition for the loop that filled the card table was never executed, as its condition was also wrong. In the end this didn't lead to any disasters as the value of the card table doesn't matter for newly allocated arrays. * The card table was unnecessarily initialized. The card table is only used when the array isn't copied, which new arrays always are. By not writing the card table at all we save some cycles. * The ticky allocation accounting was wrong. The second argument to TICK_ALLOC_PRIM is the size of the closure excluding the header size, but the header size was incorrectly included. Fixes #8867.
* Add a way to reserve temporary stack space in high-level CmmSimon Marlow2014-01-161-22/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We occasionally need to reserve some temporary memory in a primop for passing to a foreign function. We've been using the stack for this, but when we moved to high-level Cmm it became quite fragile because primops are in high-level Cmm and the stack is supposed to be under the control of the Cmm pipeline. So this change puts things on a firmer footing by adding a new Cmm construct 'reserve'. e.g. in decodeFloat_Int#: reserve 2 = tmp { mp_tmp1 = tmp + WDS(1); mp_tmp_w = tmp; /* Perform the operation */ ccall __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg); r1 = W_[mp_tmp1]; r2 = W_[mp_tmp_w]; } reserve is described in CmmParse.y. Unfortunately the argument to reserve must be a compile-time constant. We might have to extend the parser to allow expressions with arithmetic operators if this is too restrictive. Note also that the return instruction for the procedure must be outside the scope of the reserved stack area, so we have to extract the values from the reserved area before we close the scope. This means some more local variables (r1, r2 in the example above). The generated code is more or less identical to what we had before though.
* Remove use of R9, and fix associated bugsSimon Marlow2013-10-011-1/+1
| | | | | | | | | | We were passing the function address to stg_gc_prim_p in R9, which was wrong because the call was a high-level call and didn't declare R9 as a parameter. Passing R9 as an argument is the right way, but unfortunately that exposed another bug: we were using the same macro in some low-level Cmm, where it is illegal to call functions with arguments (see Note [syntax of cmm files]). So we now have low-level variants of STK_CHK() and STK_CHK_P() for use in low-level Cmm code.
* Merge remote-tracking branch 'origin/master' into ghc-parmake-gsocghc-parmake-gsocPatrick Palka2013-09-081-8/+8
|\
| * Avoid allocating while holding a lock (#8242)Takano Akio2013-09-081-8/+8
| | | | | | | | | | | | | | | | | | | | This reverts commit 6770663f764db76dbb7138ccb3aea0527d194151. If the program enters the garbage collector with the closure lock held, it will confuse the garbage collector and will result in an infinite loop in evacuate(). Signed-off-by: Austin Seipp <aseipp@pobox.com>
* | Merge remote-tracking branch 'origin/master' into ghc-parmake-gsocPatrick Palka2013-09-041-2/+58
|\ \ | |/
| * minor: remove tabs from fileatomicsRyan Newton2013-08-311-4/+4
| |
| * minor bugfix to casIntArray# and fetchAddIntArray#Ryan Newton2013-08-221-4/+6
| |
| * Eliminate atomic_inc_by and instead medofiy atomic_inc.Ryan Newton2013-08-211-1/+1
| |
| * Add PrimOp fetchAddIntArray# plus supporting C function atomic_inc_by.Ryan Newton2013-08-211-1/+13
| |
| * Add PrimOp: casIntArray#. Modify casMutVar# for 'ticketed' style.Ryan Newton2013-08-211-2/+22
| |
| * Update stg_casArrayzh to conform to new CMM conventions.Ryan Newton2013-08-211-9/+6
| |
| * Tweak stg_casArrayzh as per Simon Marlow's suggestion.Ryan Newton2013-08-211-6/+4
| |
| * add casArray# primop, similar to casMutVar# but for array elementsRyan Newton2013-08-211-0/+27
| |
* | UniqSupply: make mkSplitUniqSupply thread-safePatrick Palka2013-08-261-0/+5
|/ | | | | | | | | | | | | unsafeInterleaveIO is used instead of unsafeDupableInterleaveIO because a mk_supply thunk that is simultaneously entered by two threads should evaluate to the same UniqSupply. The UniqSupply counter is now incremented atomically using the RTS's atomic_inc(). To mitigate the extra overhead of unsafeInterleaveIO in the single-threaded compiler, noDuplicate# is changed to exit early when n_capabilities == 1.
* Fix bug in readMVar implementation: keep clean MVars clean.Edward Z. Yang2013-07-171-2/+2
| | | | | | | | | | The readMVar implementation had only partially implemented a micro-optimization which allows us to avoid adding an MVar to the mutable list if the MVar was not changed. However, this was not applied to the release method on the fast path, resulting in dirty MVars which were not added to the mutable list. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Rename atomicReadMVar and friends to readMVar.Edward Z. Yang2013-07-121-7/+7
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Implement tryAtomicReadMVar#.Edward Z. Yang2013-07-101-0/+16
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Don't call dirty_MVAR on atomicReadMVar unless we change the MVar.Edward Z. Yang2013-07-101-4/+4
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Add LOCK_CLOSURE macro for use in C--, which inlines the capability check.Edward Z. Yang2013-07-101-48/+8
| | | | | | | | This patch also tweaks lockClosure to be INLINE_HEADER, so C-- clients don't accidentally use them and updates some other code which locks closures to do the capability check. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Implement atomicReadMVar, fixing #4001.Edward Z. Yang2013-07-091-3/+76
| | | | | | | | | We add the invariant to the MVar blocked threads queue that threads blocked on an atomic read are always at the front of the queue. This invariant is easy to maintain, since takers are only ever added to the end of the queue. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Optimise lockClosure when n_capabilities == 1; fixes #693Ian Lynagh2013-06-151-4/+24
| | | | Based on a patch from Yuras Shumovich.
* Maintain per-generation lists of weak pointers (#7847)Takano Akio2013-06-151-2/+2
|
* Check for a weak pointer being dead before we do any allocation for itIan Lynagh2013-06-151-8/+8
|
* Allow multiple C finalizers to be attached to a Weak#Takano Akio2013-06-151-56/+57
| | | | | | | | | | | | | The commit replaces mkWeakForeignEnv# with addCFinalizerToWeak#. This new primop mutates an existing Weak# object and adds a new C finalizer to it. This change removes an invariant in MarkWeak.c, namely that the relative order of Weak# objects in the list needs to be preserved across GC. This makes it easier to split the list into per-generation structures. The patch also removes a race condition between two threads calling finalizeWeak# on the same WEAK object at that same time.
* Optimization for takeMVar/putMVar when MVar left empty; fixes #7923Ian Lynagh2013-06-151-20/+29
| | | | | | | | We only need to apply the write barrier to an MVar when it acquires a reference to live data; when the MVar is left empty in the case of a takeMVar/putMVar, we can save a memory reference. Patch from Edward Z. Yang.
* Fix a commentIan Lynagh2013-06-091-3/+1
|
* Whitespace onlyIan Lynagh2013-06-091-124/+124
|
* Separate StablePtr and StableName tables (#7674)Simon Marlow2013-02-141-11/+11
| | | | To improve performance of StablePtr.
* Add a write barrier for TVAR closuresSimon Marlow2012-11-161-1/+1
| | | | | | | | | | This improves GC performance when there are a lot of TVars in the heap. For instance, a TChan with a lot of elements causes a massive GC drag without this patch. There's more to do - several other STM closure types don't have write barriers, so GC performance when there are a lot of threads blocked on STM isn't great. But fixing the problem for TVar is a good start.