summaryrefslogtreecommitdiff
path: root/compiler/codeGen/StgCmmBind.hs
Commit message (Collapse)AuthorAgeFilesLines
...
* Merge cgTailCall and cgLneJump into one functionJan Stolarek2013-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | Previosly logic of these functions was sth like this: cgIdApp x = case x of A -> cgLneJump x _ -> cgTailCall x cgTailCall x = case x of B -> ... C -> ... _ -> ... After merging there is no nesting of cases: cgIdApp x = case x of A -> -- body of cgLneJump B -> ... C -> ... _ -> ...
* Remove unused moduleJan Stolarek2013-08-201-3/+0
| | | | | | This commit removes module StgCmmGran which has only no-op functions. According to comments in the module, it was used by GpH, but GpH project seems to be dead for a couple of years now.
* Cleanup StgCmm passJan Stolarek2013-08-201-19/+19
| | | | | | | | | | | | | | This cleanup includes: * removing dead code. This includes forkStatics function, which was in fact one big noop, and global bindings in CgInfoDownwards, * converting functions that used FCode monad only to access DynFlags into functions that take DynFlags as a parameter and don't work in a monad, * addBindC function is now smarter. It extracts Id from CgIdInfo passed to it in the same way addBindsC does. Previously this was done at every call site, which was redundant.
* Trailing whitespaces, code formatting, detabifyJan Stolarek2013-08-201-4/+4
| | | | | A major cleanup of trailing whitespaces and tabs in codeGen/ directory. I also adjusted code formatting in some places.
* Add final remaining bits to fix #7978.Geoffrey Mainland2013-07-221-30/+1
|
* Add a work-around for #7978.Geoffrey Mainland2013-06-221-2/+7
| | | | | This patch fixes profiling at the cost of losing cost centre accounting in a very small number of cases. I am working on a better fix.
* Wibbles (merg-os) to ticky-tickySimon Peyton Jones2013-06-061-2/+2
|
* Implement cardinality analysisSimon Peyton Jones2013-06-061-10/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This major patch implements the cardinality analysis described in our paper "Higher order cardinality analysis". It is joint work with Ilya Sergey and Dimitrios Vytiniotis. The basic is augment the absence-analysis part of the demand analyser so that it can tell when something is used never at most once some other way The "at most once" information is used a) to enable transformations, and in particular to identify one-shot lambdas b) to allow updates on thunks to be omitted. There are two new flags, mainly there so you can do performance comparisons: -fkill-absence stops GHC doing absence analysis at all -fkill-one-shot stops GHC spotting one-shot lambdas and single-entry thunks The big changes are: * The Demand type is substantially refactored. In particular the UseDmd is factored as follows data UseDmd = UCall Count UseDmd | UProd [MaybeUsed] | UHead | Used data MaybeUsed = Abs | Use Count UseDmd data Count = One | Many Notice that UCall recurses straight to UseDmd, whereas UProd goes via MaybeUsed. The "Count" embodies the "at most once" or "many" idea. * The demand analyser itself was refactored a lot * The previously ad-hoc stuff in the occurrence analyser for foldr and build goes away entirely. Before if we had build (\cn -> ...x... ) then the "\cn" was hackily made one-shot (by spotting 'build' as special. That's essential to allow x to be inlined. Now the occurrence analyser propagates info gotten from 'build's stricness signature (so build isn't special); and that strictness sig is in turn derived entirely automatically. Much nicer! * The ticky stuff is improved to count single-entry thunks separately. One shortcoming is that there is no DEBUG way to spot if an allegedly-single-entry thunk is acually entered more than once. It would not be hard to generate a bit of code to check for this, and it would be reassuring. But it's fiddly and I have not done it. Despite all this fuss, the performance numbers are rather under-whelming. See the paper for more discussion. nucleic2 -0.8% -10.9% 0.10 0.10 +0.0% sphere -0.7% -1.5% 0.08 0.08 +0.0% -------------------------------------------------------------------------------- Min -4.7% -10.9% -9.3% -9.3% -50.0% Max -0.4% +0.5% +2.2% +2.3% +7.4% Geometric Mean -0.8% -0.2% -1.3% -1.3% -1.8% I don't quite know how much credence to place in the runtime changes, but movement seems generally in the right direction.
* extended ticky to also track "let"s that are not conventional closuresNicolas Frisby2013-05-021-9/+13
| | | | | | | This includes selector, ap, and constructor thunks. They are still guarded by the -ticky-dyn-thk flag. (This is 024df664b600a with a small bug fix.)
* In CMM, only allow foreign calls to labels, not arbitrary expressionsIan Lynagh2013-04-241-1/+1
| | | | | | | | | I'm not sure if we want to make this change permanently, but for now it fixes the unreg build. I've also removed some redundant special-case code that generated prototypes for foreign functions. The standard pprTempAndExternDecls now generates them.
* Revert "extended ticky to also track "let"s that are not closures"Nicolas Frisby2013-04-121-14/+9
| | | | | | This reverts commit 024df664b600a622cb8189ccf31789688505fc1c. Of course I gaff on my last day...
* extended ticky to also track "let"s that are not closuresNicolas Frisby2013-04-121-9/+14
| | | | | This includes selector, ap, and constructor thunks. They are still guarded by the -ticky-dyn-thk flag.
* ticky enhancementsNicolas Frisby2013-03-291-26/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * the new StgCmmArgRep module breaks a dependency cycle; I also untabified it, but made no real changes * updated the documentation in the wiki and change the user guide to point there * moved the allocation enters for ticky and CCS to after the heap check * I left LDV where it was, which was before the heap check at least once, since I have no idea what it is * standardized all (active?) ticky alloc totals to bytes * in order to avoid double counting StgCmmLayout.adjustHpBackwards no longer bumps ALLOC_HEAP_ctr * I resurrected the SLOW_CALL counters * the new module StgCmmArgRep breaks cyclic dependency between Layout and Ticky (which the SLOW_CALL counters cause) * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL * added ALLOC_RTS_ctr and _tot ticky counters * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and ALLOC_PRIM * added -ticky and -DTICKY_TICKY in ways.mk for debug ways * added a ticky counter for total LNE entries * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE * all off by default * -ticky-allocd: tracks allocation *of* closure in addition to allocation *by* that closure * -ticky-dyn-thunk tracks dynamic thunks as if they were functions * -ticky-LNE tracks LNEs as if they were functions * updated the ticky report format, including making the argument categories (more?) accurate again * the printed name for things in the report include the unique of their ticky parent as well as if they are not top-level
* Tidy up: move info-table related stuff to CmmInfoSimon Marlow2013-01-231-0/+1
| | | | Prep for #709
* Code-size optimisation for top-level indirections (#7308)Simon Marlow2012-11-191-9/+31
| | | | | | | | | | | | | | | Top-level indirections are often generated when there is a cast, e.g. foo :: T foo = bar `cast` (some coercion) For these we were generating a full-blown CAF, which is a fair chunk of code. This patch makes these indirections generate a single IND_STATIC closure (4 words) instead. This is exactly what the CAF would evaluate to eventually anyway, we're just shortcutting the whole process.
* Fix the Slow calling convention (#7192)Simon Marlow2012-11-131-10/+11
| | | | | | | | The Slow calling convention passes the closure in R1, but we were ignoring this and hoping it would work, which it often did. However, this bug seems to have been the cause of #7192, because the graph-colouring allocator is more sensitive to having correct liveness information on jumps.
* Some alpha renamingIan Lynagh2012-10-161-5/+5
| | | | | Mostly d -> g (matching DynFlag -> GeneralFlag). Also renamed if* to when*, matching the Haskell if/when names
* Produce new-style Cmm from the Cmm parserSimon Marlow2012-10-081-16/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
* Change some "else return ()"s to use when/unlessIan Lynagh2012-09-201-2/+1
|
* Move tAG_BITS into platformConstantsIan Lynagh2012-09-161-2/+2
|
* Move wORD_SIZE into platformConstantsIan Lynagh2012-09-161-2/+1
|
* Start moving other constants from (Haskell)Constants to platformConstantsIan Lynagh2012-09-141-2/+2
|
* Use oFFSET_* from platformConstants rather than ConstantsIan Lynagh2012-09-131-1/+1
|
* Use sIZEOF_* from platformConstants rather than ConstantsIan Lynagh2012-09-131-1/+1
|
* Pass DynFlags down to wordWidthIan Lynagh2012-09-121-4/+4
|
* Pass DynFlags down to bWordIan Lynagh2012-09-121-9/+11
| | | | | | I've switched to passing DynFlags rather than Platform, as (a) it's simpler to not have to extract targetPlatform in so many places, and (b) it may be useful to have DynFlags around in future.
* Cleanup: add mkIntExpr and zeroExpr utilsSimon Marlow2012-08-311-1/+1
|
* remove tabsSimon Marlow2012-08-211-124/+117
|
* Remove uses of fixC from the codeGen, and make the FCode monad strictSimon Marlow2012-08-091-74/+110
|
* fix warningSimon Marlow2012-08-071-1/+1
|
* entryHeapCheck: fix calls to stg_gc_fun and stg_gc_enter_1Simon Marlow2012-08-071-2/+2
| | | | | | | | | We weren't passing the arguments correctly to the GC functions, which usually happened to work because the arguments were in the right registers already. After this fix the profiling tests go through with the new code generator.
* Small optimisationSimon Marlow2012-08-071-5/+6
| | | | | | When calling newCAF, refer to the closure using its LocalReg rather than R1. Using R1 here was preventing the register allocator from coalescing the assignment x=R1 at the beginning of the function.
* fix a warningSimon Marlow2012-08-071-1/+1
|
* Fix update frames for profilingSimon Marlow2012-08-071-12/+16
|
* Cleanup and fixes to profilingSimon Marlow2012-08-071-1/+5
|
* A closure with void args only should be a function, not a thunkSimon Marlow2012-08-071-4/+3
|
* Generate one fewer temps per heap allocationSimon Marlow2012-08-071-5/+6
| | | | | This saves compile time and can make a big difference in some pathological cases (T4801)
* Add "Unregisterised" as a field in the settings fileIan Lynagh2012-08-071-2/+3
| | | | | | To explicitly choose whether you want an unregisterised build you now need to use the "--enable-unregisterised"/"--disable-unregisterised" configure flags.
* Make tablesNextToCode "dynamic"Ian Lynagh2012-08-061-3/+4
| | | | | This is a bit odd by itself, but it's a stepping stone on the way to putting "target unregisterised" into the settings file.
* Use "ReturnedTo" when generating safe foreign callsSimon Marlow2012-08-061-4/+3
|
* Add a comment to explain why the FCode monad is lazySimon Marlow2012-08-061-1/+2
|
* Explicitly share some return continuationsSimon Marlow2012-08-021-2/+4
| | | | | | | Instead of relying on common-block-elimination to share return continuations in the common case (case-alternative heap checks) we do it explicitly. This isn't hard to do, is more robust, and saves some compilation time. Full commentary in Note [sharing continuations].
* Small optimisation to the code generated for CAFsSimon Marlow2012-07-301-9/+14
|
* New codegen: do not split proc-points when using the NCGSimon Marlow2012-07-301-1/+1
| | | | | | | | | Proc-point splitting is only required by backends that do not support having proc-points within a code block (that is, everything except the native backend, i.e. LLVM and C). Not doing proc-point splitting saves some compilation time, and might produce slightly better code in some cases.
* Make -fscc-profiling a dynamic flagIan Lynagh2012-07-241-23/+28
| | | | All the flags that 'ways' imply are now dynamic
* remove some redundant SRT-related stuffSimon Marlow2012-07-111-5/+3
|
* Merge remote-tracking branch 'origin/master' into newcgSimon Marlow2012-07-041-22/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * origin/master: (756 commits) don't crash if argv[0] == NULL (#7037) -package P was loading all versions of P in GHCi (#7030) Add a Note, copying text from #2437 improve the --help docs a bit (#7008) Copy Data.HashTable's hashString into our Util module Build fix Build fixes Parse error: suggest brackets and indentation. Don't build the ghc DLL on Windows; works around trac #5987 On Windows, detect if DLLs have too many symbols; trac #5987 Add some more Integer rules; fixes #6111 Fix PA dfun construction with silent superclass args Add silent superclass parameters to the vectoriser Add silent superclass parameters (again) Mention Generic1 in the user's guide Make the GHC API a little more powerful. tweak llvm version warning message New version of the patch for #5461. Fix Word64ToInteger conversion rule. Implemented feature request on reconfigurable pretty-printing in GHCi (#5461) ... Conflicts: compiler/basicTypes/UniqSupply.lhs compiler/cmm/CmmBuildInfoTables.hs compiler/cmm/CmmLint.hs compiler/cmm/CmmOpt.hs compiler/cmm/CmmPipeline.hs compiler/cmm/CmmStackLayout.hs compiler/cmm/MkGraph.hs compiler/cmm/OldPprCmm.hs compiler/codeGen/CodeGen.lhs compiler/codeGen/StgCmm.hs compiler/codeGen/StgCmmBind.hs compiler/codeGen/StgCmmLayout.hs compiler/codeGen/StgCmmUtils.hs compiler/main/CodeOutput.lhs compiler/main/HscMain.hs compiler/nativeGen/AsmCodeGen.lhs compiler/simplStg/SimplStg.lhs
| * Remove some unnecessary platform argumentsIan Lynagh2012-06-131-7/+3
| |
| * Pass DynFlags down to showSDocDumpIan Lynagh2012-06-121-6/+10
| | | | | | | | | | To help with this, we now also pass DynFlags around inside the SpecM monad.
| * Fix for earger blackholing of thunks with no free variables (#6146)Simon Marlow2012-06-071-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A thunk with no free variables was not getting blackholed when -feager-blackholing was on, but we were nevertheless pushing the stg_bh_upd_frame version of the update frame that expects to see a black hole. I fixed this twice for good measure: - we now call blackHoleOnEntry when pushing the update frame to check whether the closure was actually blackholed, and so that we use the same predicate in both places - we now black hole thunks even if they have no free variables. These only occur when optimisation is off, but presumably if you say -feager-blackholing then that's what you want to happen.