diff options
author | sewardj <unknown> | 2000-06-15 08:38:25 +0000 |
---|---|---|
committer | sewardj <unknown> | 2000-06-15 08:38:25 +0000 |
commit | 665229e50c7704d70e8b8966c9f0e8d4623cf293 (patch) | |
tree | 7004c6a7692c702502a87fa8e2d74e5247240999 /ghc/compiler/nativeGen/AsmCodeGen.lhs | |
parent | 1364aa0bb9152ca00f433359487ec83ed7136df0 (diff) | |
download | haskell-665229e50c7704d70e8b8966c9f0e8d4623cf293.tar.gz |
[project @ 2000-06-15 08:38:25 by sewardj]
Major thing: new register allocator. Brief description follows.
Should correctly handle code with loops in, even though we don't
generate any such at the moment. A lot of comments. The previous
machinery for spilling is retained, as is the idea of a fast-and-easy
initial allocation attempt intended to deal with the majority of code
blocks (about 60% on x86) very cheaply. Many comments explaining
in detail how it works :-)
The Stix inliner is now on by default. Integer code seems to run
within about 1% of that -fvia-C. x86 fp code is significantly worse,
up to about 30% slower, depending on the amount of fp activity.
Minor thing: lazyfication of the top-level NCG plumbing, so that the
NCG doesn't require any greater residency than compiling to C, just a
bit more time. Created lazyThenUs and lazyMapUs for this purpose.
The new allocator is somewhat, although not catastophically, slower
than the old one. Fixing of the long-standing NCG space leak more
than makes up for it; overall hsc run-time is down about 5%, due to
significantly reduced GC time.
--------------------------------------------------------------------
Instructions are numbered sequentially, starting at zero.
A flow edge (FE) is a pair of insn numbers (MkFE Int Int) denoting
a possible flow of control from the first insn to the second.
The input to the register allocator is a list of instructions, which
mention Regs. A Reg can be a RealReg -- a real machine reg -- or a
VirtualReg, which carries a unique. After allocation, all the
VirtualReg references will have been converted into RealRegs, and
possibly some spill code will have been inserted.
The heart of the register allocator works in four phases.
1. (find_flow_edges) Calculate all the FEs for the code list.
Return them not as a [FE], but implicitly, as a pair of
Array Int [Int], being the successor and predecessor maps
for instructions.
2. (calc_liveness) Returns a FiniteMap FE RegSet. For each
FE, indicates the set of registers live on that FE. Note
that the set includes both RealRegs and VirtualRegs. The
former appear because the code could mention fixed register
usages, and we need to take them into account from the start.
3. (calc_live_range_sets) Invert the above mapping, giving a
FiniteMap Reg FeSet, indicating, for each virtual and real
reg mentioned in the code, which FEs it is live on.
4. (calc_vreg_to_rreg_mapping) For virtual reg, try and find
an allocatable real register for it. Each real register has
a "current commitment", indicating the set of FEs it is
currently live on. A virtual reg v can be assigned to
real reg r iff v's live-fe-set does not intersect with r's
current commitment fe-set. If the assignment is made,
v's live-fe-set is union'd into r's current commitment fe-set.
There is also the minor restriction that v and r must be of
the same register class (integer or floating).
Once this mapping is established, we simply apply it to the
input insns, and that's it.
If no suitable real register can be found, the vreg is mapped
to itself, and we deem allocation to have failed. The partially
allocated code is returned. The higher echelons of the allocator
(doGeneralAlloc and runRegAlloc) then cooperate to insert spill
code and re-run allocation, until a successful allocation is found.
Diffstat (limited to 'ghc/compiler/nativeGen/AsmCodeGen.lhs')
-rw-r--r-- | ghc/compiler/nativeGen/AsmCodeGen.lhs | 103 |
1 files changed, 51 insertions, 52 deletions
diff --git a/ghc/compiler/nativeGen/AsmCodeGen.lhs b/ghc/compiler/nativeGen/AsmCodeGen.lhs index f483095f82..d85bc69133 100644 --- a/ghc/compiler/nativeGen/AsmCodeGen.lhs +++ b/ghc/compiler/nativeGen/AsmCodeGen.lhs @@ -18,9 +18,10 @@ import PprMach import AbsCStixGen ( genCodeAbstractC ) import AbsCSyn ( AbstractC, MagicId ) +import AbsCUtils ( mkAbsCStmtList ) import AsmRegAlloc ( runRegAllocate ) import PrimOp ( commutableOp, PrimOp(..) ) -import RegAllocInfo ( mkMRegsState, MRegsState, findReservedRegs ) +import RegAllocInfo ( findReservedRegs ) import Stix ( StixTree(..), StixReg(..), pprStixTrees, pprStixTree, CodeSegment(..), stixCountTempUses, stixSubst, @@ -29,7 +30,8 @@ import Stix ( StixTree(..), StixReg(..), uniqOfNatM_State, deltaOfNatM_State ) import PrimRep ( isFloatingRep, PrimRep(..) ) import UniqSupply ( returnUs, thenUs, mapUs, initUs, - initUs_, UniqSM, UniqSupply ) + initUs_, UniqSM, UniqSupply, + lazyThenUs, lazyMapUs ) import MachMisc ( IF_ARCH_i386(i386_insert_ffrees,) ) import OrdList ( fromOL, concatOL ) @@ -87,38 +89,47 @@ So, here we go: \begin{code} nativeCodeGen :: AbstractC -> UniqSupply -> (SDoc, SDoc) nativeCodeGen absC us - = let (stixRaw, us1) = initUs us (genCodeAbstractC absC) - stixOpt = map genericOpt stixRaw - insns = initUs_ us1 (codeGen stixOpt) - debug_stix = vcat (map pprStixTrees stixOpt) - in {- trace "nativeGen: begin" -} (debug_stix, insns) -\end{code} - -@codeGen@ is the top-level code-generation function: -\begin{code} -codeGen :: [[StixTree]] -> UniqSM SDoc - -codeGen stixFinal - = mapUs genMachCode stixFinal `thenUs` \ dynamic_codes -> - let - fp_kludge :: [Instr] -> [Instr] - fp_kludge = IF_ARCH_i386(i386_insert_ffrees,id) - - static_instrss :: [[Instr]] - static_instrss = map fp_kludge (scheduleMachCode dynamic_codes) - docs = map (vcat . map pprInstr) static_instrss - - -- for debugging only - docs_prealloc = map (vcat . map pprInstr . fromOL) - dynamic_codes - text_prealloc = vcat (intersperse (char ' ' $$ char ' ') docs_prealloc) - in - --trace (showSDoc text_prealloc) ( - returnUs (vcat (intersperse (char ' ' - $$ ptext SLIT("# ___stg_split_marker") - $$ char ' ') - docs)) - --) + = let absCstmts = mkAbsCStmtList absC + (sdoc_pairs, us1) = initUs us (lazyMapUs absCtoNat absCstmts) + stix_sdocs = map fst sdoc_pairs + insn_sdocs = map snd sdoc_pairs + + insn_sdoc = my_vcat insn_sdocs + stix_sdoc = vcat stix_sdocs + +# if DEBUG + my_trace m x = trace m x + my_vcat sds = vcat (intersperse (char ' ' + $$ ptext SLIT("# ___stg_split_marker") + $$ char ' ') + sds) +# else + my_vcat sds = vcat sds + my_trace m x = x +# endif + in + my_trace "nativeGen: begin" + (stix_sdoc, insn_sdoc) + + +absCtoNat :: AbstractC -> UniqSM (SDoc, SDoc) +absCtoNat absC + = genCodeAbstractC absC `thenUs` \ stixRaw -> + genericOpt stixRaw `bind` \ stixOpt -> + genMachCode stixOpt `thenUs` \ pre_regalloc -> + regAlloc pre_regalloc `bind` \ almost_final -> + x86fp_kludge almost_final `bind` \ final_mach_code -> + vcat (map pprInstr final_mach_code) `bind` \ final_sdoc -> + pprStixTrees stixOpt `bind` \ stix_sdoc -> + returnUs (stix_sdoc, final_sdoc) + where + bind f x = x f + + x86fp_kludge :: [Instr] -> [Instr] + x86fp_kludge = IF_ARCH_i386(i386_insert_ffrees,id) + + regAlloc :: InstrBlock -> [Instr] + regAlloc = runRegAllocate allocatableRegs findReservedRegs \end{code} Top level code generator for a chunk of stix code. For this part of @@ -154,20 +165,6 @@ genMachCode stmts initial_us (int final_delta) \end{code} -The next bit does the code scheduling. The scheduler must also deal -with register allocation of temporaries. Much parallelism can be -exposed via the OrdList, but more might occur, so further analysis -might be needed. - -\begin{code} -scheduleMachCode :: [InstrBlock] -> [[Instr]] - -scheduleMachCode - = map (runRegAllocate freeRegsState findReservedRegs) - where - freeRegsState = mkMRegsState (extractMappedRegNos freeRegs) -\end{code} - %************************************************************************ %* * \subsection[NCOpt]{The Generic Optimiser} @@ -197,24 +194,26 @@ stixPeep :: [StixTree] -> [StixTree] -- second assignment would be substituted for, giving nonsense -- code. As far as I can see, StixTemps are only ever assigned -- to once. It would be nice to be sure! -{- + stixPeep ( t1@(StAssign pka (StReg (StixTemp u pk)) rhs) : t2 : ts ) | stixCountTempUses u t2 == 1 && sum (map (stixCountTempUses u) ts) == 0 - = trace ("nativeGen: stixInline: " ++ showSDoc (pprStixTree rhs)) + = +# ifdef DEBUG + trace ("nativeGen: inlining " ++ showSDoc (pprStixTree rhs)) +# endif (stixPeep (stixSubst u rhs t2 : ts)) stixPeep (t1:t2:ts) = t1 : stixPeep (t2:ts) stixPeep [t1] = [t1] stixPeep [] = [] --} -- disable stix inlining until we figure out how to fix the -- latent bugs in the register allocator which are exposed by -- the inliner. -stixPeep = id +--stixPeep = id \end{code} For most nodes, just optimize the children. |