summaryrefslogtreecommitdiff
path: root/compiler/codeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* Add two new primops:Simon Marlow2011-06-283-0/+46
| | | | | | | | | | | | | seq# :: a -> State# s -> (# State# s, a #) spark# :: a -> State# s -> (# State# s, a #) seq# is a version of seq that can be used in a State#-passing context. We will use it to implement Control.Exception.evaluate and thus fix #5129. Also we have plans to use it to fix #5262. spark# is to seq# as par is to pseq. That is, it creates a spark in a State#-passing context. We will use spark# and seq# to implement rpar and rseq respectively in an improved implementation of the Eval monad.
* codeGen: Make emitCopyByteArray less pessimisticJohan Tibell2011-06-172-19/+2
| | | | | | | | Assigning the arguments to temporaries was only needed in the case of emitCopyArray, where the arguments are alive across the call. That is not the case in emitCopyByteArray. Signed-off-by: David Terei <davidterei@gmail.com>
* Port "Add byte array copy primops" to the new code genJohan Tibell2011-06-161-0/+57
| | | | Signed-off-by: David Terei <davidterei@gmail.com>
* Add byte array copy primopsJohan Tibell2011-06-161-0/+59
| | | | Signed-off-by: David Terei <davidterei@gmail.com>
* Port "6c7d2a9 Use the new memcpy/memmove/memset MachOps" to new codegen.Edward Z. Yang2011-06-151-37/+23
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Use the new memcpy/memmove/memset MachOpsJohan Tibell2011-06-141-24/+25
| | | | Signed-off-by: David Terei <davidterei@gmail.com>
* Remove type synonyms for CmmFormals, CmmActuals (and hinted versions).Edward Z. Yang2011-06-136-18/+18
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Port "Make array copy primops inline" and related patches to new codegen.Edward Z. Yang2011-06-135-4/+234
| | | | | | | | | | The following patches were ported: d0faaa6 Fix segfault in array copy primops on 32-bit 18691d4 Make assignTemp_ less pessimistic 9c23f06 Make array copy primops inline Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Fix segfault in array copy primops on 32-bitJohan Tibell2011-06-071-4/+4
| | | | | | | The second argument to C's memset was passed as a W8 while memset expects an int. Signed-off-by: David Terei <davidterei@gmail.com>
* Make assignTemp_ less pessimisticJohan Tibell2011-05-301-6/+10
| | | | | | assignTemp_ is intended to make sure that the expression gets assigned to a temporary in case that's needed in order to avoid a register getting trashed due to a function call.
* Make array copy primops inlineJohan Tibell2011-05-192-3/+228
|
* Amend comment per Marlow's comments.Edward Z. Yang2011-05-161-15/+16
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Work around lack of saving volatile registers from unsafe foreign calls.Edward Z. Yang2011-05-151-0/+61
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* For BC labels, emit empty data section instead of empty proc.Edward Z. Yang2011-04-142-2/+3
| | | | | | | | | | | | | This fixes two bugs: - The new code generator doesn't like procedures with empty graphs, and panicked in labelAGraph. - LLVM optimizes away empty procedures but not empty data sections, so now the backwards-compatibility labels actually work with -fllvm. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Change the way module initialisation is done (#3252, #4417)Simon Marlow2011-04-126-432/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the code generator generated small code fragments labelled with __stginit_M for each module M, and these performed whatever initialisation was necessary for that module and recursively invoked the initialisation functions for imported modules. This appraoch had drawbacks: - FFI users had to call hs_add_root() to ensure the correct initialisation routines were called. This is a non-standard, and ugly, API. - unless we were using -split-objs, the __stginit dependencies would entail linking the whole transitive closure of modules imported, whether they were actually used or not. In an extreme case (#4387, #4417), a module from GHC might be imported for use in Template Haskell or an annotation, and that would force the whole of GHC to be needlessly linked into the final executable. So now instead we do our initialisation with C functions marked with __attribute__((constructor)), which are automatically invoked at program startup time (or DSO load-time). The C initialisers are emitted into the stub.c file. This means that every time we compile with -prof or -hpc, we now get a stub file, but thanks to #3687 that is now invisible to the user. There are some refactorings in the RTS (particularly for HPC) to handle the fact that initialisers now get run earlier than they did before. The __stginit symbols are still generated, and the hs_add_root() function still exists (but does nothing), for backwards compatibility.
* Remove debugging CmmComment from old code generator.Edward Z. Yang2011-04-111-1/+0
| | | | | | | Warning: This change seems to tickle a bug in ghc-stage1 compiler built with GHC 6.12.1 during validates. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Minor documentation improvement about pointer tagging.Edward Z. Yang2011-04-041-3/+5
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Immediately tag initialization code to prevent untagged spills.Edward Z. Yang2011-03-233-6/+14
| | | | | | | | | | | | | | | | | | | | When allocating new objects on the heap, we previously returned a CmmExpr containing the heap pointer as well as the tag expression, which would be added to the code graph upon first usage. Unfortunately, this meant that untagged heap pointers living in registers might be spilled to the stack, where they interacted poorly with garbage collection (we saw this bug specifically with the compacting garbage collector.) This fix immediately tags the register containing the heap pointer, so that unless we have extremely unfriendly spill code, the new pointer will never be spilled to the stack untagged. An alternate solution might have been to modify allocDynClosure to tag the pointer upon the initial register allocation, but not all invocations of allocDynClosure tag the resulting pointer, and threading the consequent CgIdInfo for the cases that did would have been annoying.
* Fix Array sizeof primops to use the correct offset (which happens to be 0, ↵Daniel Peebles2011-02-012-2/+2
| | | | so it worked before anyway). Makes us more future-proof, at least
* Add sizeof(Mutable)Array# primitivesDaniel Peebles2011-01-262-0/+10
|
* Merge in new code generator branch.Simon Marlow2011-01-2438-492/+541
| | | | | | | | | | | | | | | | | | | | | | | | | | This changes the new code generator to make use of the Hoopl package for dataflow analysis. Hoopl is a new boot package, and is maintained in a separate upstream git repository (as usual, GHC has its own lagging darcs mirror in http://darcs.haskell.org/packages/hoopl). During this merge I squashed recent history into one patch. I tried to rebase, but the history had some internal conflicts of its own which made rebase extremely confusing, so I gave up. The history I squashed was: - Update new codegen to work with latest Hoopl - Add some notes on new code gen to cmm-notes - Enable Hoopl lag package. - Add SPJ note to cmm-notes - Improve GC calls on new code generator. Work in this branch was done by: - Milan Straka <fox@ucw.cz> - John Dias <dias@cs.tufts.edu> - David Terei <davidterei@gmail.com> Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD and fixed a few bugs.
* Implement stack chunks and separate TSO/STACK objectsSimon Marlow2010-12-152-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
* fix ticket number (#4505)Simon Marlow2010-12-091-1/+1
|
* Catch too-large allocations and emit an error message (#4505)Simon Marlow2010-12-091-0/+10
| | | | | | | | | | | | | | | | This is a temporary measure until we fix the bug properly (which is somewhat tricky, and we think might be easier in the new code generator). For now we get: ghc-stage2: sorry! (unimplemented feature or known bug) (GHC version 7.1 for i386-unknown-linux): Trying to allocate more than 1040384 bytes. See: http://hackage.haskell.org/trac/ghc/ticket/4550 Suggestion: read data from a file instead of having large static data structures in the code.
* make a panic message more informative and suggest -dcore-lint (see #4534)Simon Marlow2010-12-011-4/+4
|
* Remove unncessary fromIntegral callssimonpj@microsoft.com2010-11-164-4/+4
|
* Remove unnecessary importsIan Lynagh2010-10-263-4/+0
|
* Follow GHC.Bool/GHC.Types mergeIan Lynagh2010-10-231-2/+2
|
* Fix some whitespaceIan Lynagh2010-10-211-16/+16
|
* Use takeUniqFromSupply in emitProcWithConventionIan Lynagh2010-10-211-2/+3
| | | | | We were using the supply's unique, and then passing the same supply to initUs_, which sounds like a bug waiting to happen.
* Interruptible FFI calls with pthread_kill and CancelSynchronousIO. v4Edward Z. Yang2010-09-192-2/+3
| | | | | | | | | | | | | | | | | | | | | | | This is patch that adds support for interruptible FFI calls in the form of a new foreign import keyword 'interruptible', which can be used instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe FFI calls, except that the worker thread they run on may be interrupted. Internally, it replaces BlockedOnCCall_NoUnblockEx with BlockedOnCCall_Interruptible, and changes the behavior of the RTS to not modify the TSO_ flags on the event of an FFI call from a thread that was interruptible. It also modifies the bytecode format for foreign call, adding an extra Word16 to indicate interruptibility. The semantics of interruption vary from platform to platform, but the intent is that any blocking system calls are aborted with an error code. This is most useful for making function calls to system library functions that support interrupting. There is no support for pre-Vista Windows. There is a partner testsuite patch which adds several tests for this functionality.
* LLVM: Stop llvm saving stg caller-save regs across C callsDavid Terei2010-07-051-1/+1
| | | | | | | | This is already handled by the Cmm code generator so LLVM is simply duplicating work. LLVM also doesn't know which ones are actually live so saves them all which causes a fair performance overhead for C calls on x64. We stop llvm saving them across the call by storing undef to them just before the call.
* FIX #38000 Store StgArrWords payload size in bytesAntoine Latter2010-01-012-12/+6
|
* Add new LLVM code generator to GHC. (Version 2)David Terei2010-06-152-28/+181
| | | | | | | | | | | | | | | | | | This was done as part of an honours thesis at UNSW, the paper describing the work and results can be found at: http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf A Homepage for the backend can be found at: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM Quick summary of performance is that for the 'nofib' benchmark suite, runtimes are within 5% slower than the NCG and generally better than the C code generator. For some code though, such as the DPH projects benchmark, the LLVM code generator outperforms the NCG and C code generator by about a 25% reduction in run times.
* omit "dyn" from the way appended to the __stginit labelSimon Marlow2010-04-281-8/+13
| | | | | When GHCi is linked dynamically, we still want to be able to load non-dynamic object files.
* New implementation of BLACKHOLEsSimon Marlow2010-03-294-14/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the global blackhole_queue with a clever scheme that enables us to queue up blocked threads on the closure that they are blocked on, while still avoiding atomic instructions in the common case. Advantages: - gets rid of a locked global data structure and some tricky GC code (replacing it with some per-thread data structures and different tricky GC code :) - wakeups are more prompt: parallel/concurrent performance should benefit. I haven't seen anything dramatic in the parallel benchmarks so far, but a couple of threading benchmarks do improve a bit. - waking up a thread blocked on a blackhole is now O(1) (e.g. if it is the target of throwTo). - less sharing and better separation of Capabilities: communication is done with messages, the data structures are strictly owned by a Capability and cannot be modified except by sending messages. - this change will utlimately enable us to do more intelligent scheduling when threads block on each other. This is what started off the whole thing, but it isn't done yet (#3838). I'll be documenting all this on the wiki in due course.
* Never jump directly to a thunk's entry code, even if it is single-entrySimon Marlow2010-03-251-10/+18
| | | | | | I don't think this fixes any bugs as we don't have single-entry thunks at the moment, but it could cause problems for parallel execution if we ever did re-introduce update avoidance.
* do_checks: do not set HpAlloc if the stack check failsSimon Marlow2010-03-251-6/+16
| | | | | | | | | | | | | | | | | | | | | | This fixes a very rare heap corruption bug, whereby - a context switch is requested, which sets HpLim to zero (contextSwitchCapability(), called by the timer signal or another Capability). - simultaneously a stack check fails, in a code fragment that has both a stack and a heap check. The RTS then assumes that a heap-check failure has occurred and subtracts HpAlloc from Hp, although in fact it was a stack-check failure and retreating Hp will overwrite valid heap objects. The bug is that HpAlloc should only be set when Hp has been incremented by the heap check. See comments in rts/HeapStackCheck.cmm for more details. This bug is probably incredibly rare in practice, but I happened to be working on a test that triggers it reliably: concurrent/should_run/throwto001, compiled with -O -threaded, args 30 300 +RTS -N2, run repeatedly in a loop.
* Comments onlysimonpj@microsoft.com2010-03-041-0/+2
|
* Beef up cmmMiniInline a tiny bitSimon Marlow2010-02-162-9/+1
| | | | | | | | | Allow a temporary assignment to be pushed past an assignment to a global if the global is not mentioned in the rhs of the assignment we are inlining. This fixes up some bad code. We should make sure we're doing something equivalent in the new backend in due course.
* Following Simon M's "take newCAF() out from sm_mutex" patchdias@cs.tufts.edu2010-01-051-1/+4
|
* Refactor PackageTarget back into StaticTargetBen.Lippmeier@anu.edu.au2010-01-042-13/+12
|
* Tag ForeignCalls with the package they correspond toBen.Lippmeier@anu.edu.au2010-01-027-21/+42
|
* take newCAF() out from sm_mutex; use the capability-local mut list insteadSimon Marlow2009-12-311-1/+4
|
* Copying Simon M's fix for 650 to the new codegendias@cs.tufts.edu2009-12-221-2/+15
|
* Better error checking and code cleanupdias@cs.tufts.edu2009-12-221-6/+5
|
* unused named variablesdias@cs.tufts.edu2009-12-181-2/+2
|
* missed a case in a previous fixdias@cs.tufts.edu2009-12-171-4/+26
| | | | | | | | | | | | | Here's the obscure problem: -- However, we also want to allow an assignment to be generated -- in the case when the types are compatible, because this allows -- some slightly-dodgy but occasionally-useful casts to be used, -- such as in RtClosureInspect where we cast an HValue to a MutVar# -- so we can print out the contents of the MutVar#. If we generate -- code that enters the HValue, then we'll get a runtime panic, because -- the HValue really is a MutVar#. The types are compatible though, -- so we can just generate an assignment.
* Fix #650: use a card table to mark dirty sections of mutable arraysSimon Marlow2009-12-171-3/+15
| | | | | | | | | | | | The card table is an array of bytes, placed directly following the actual array data. This means that array reading is unaffected, but array writing needs to read the array size from the header in order to find the card table. We use a bytemap rather than a bitmap, because updating the card table must be multi-thread safe. Each byte refers to 128 entries of the array, but this is tunable by changing the constant MUT_ARR_PTRS_CARD_BITS in includes/Constants.h.
* Fix warnings about unused importsBen.Lippmeier@anu.edu.au2009-11-181-1/+5
|