diff options
author | Andreas Klebinger <klebinger.andreas@gmx.at> | 2020-04-04 02:52:12 +0200 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2020-05-21 12:13:45 -0400 |
commit | 13f6c9d0376214b22d4cd16bd3a8cd7b8d864990 (patch) | |
tree | 58c8a3f7fa56f89112f3cd3182c8aa50eaeece5a /compiler/GHC/CmmToAsm/Reg/Linear.hs | |
parent | 78c6523c5106fc56b653fc14fda5741913da8fdc (diff) | |
download | haskell-13f6c9d0376214b22d4cd16bd3a8cd7b8d864990.tar.gz |
Refactor linear reg alloc to remember past assignments.
When assigning registers we now first try registers we
assigned to in the past, instead of picking the "first"
one.
This is in extremely helpful when dealing with loops for
which variables are dead for part of the loop.
This is important for patterns like this:
foo = arg1
loop:
use(foo)
...
foo = getVal()
goto loop;
There we:
* assign foo to the register of arg1.
* use foo, it's dead after this use as it's overwritten after.
* do other things.
* look for a register to put foo in.
If we pick an arbitrary one it might differ from the register the
start of the loop expect's foo to be in.
To fix this we simply look for past register assignments for
the given variable. If we find one and the register is free we
use that register.
This reduces the need for fixup blocks which match the register
assignment between blocks. In the example above between the end
and the head of the loop.
This patch also moves branch weight estimation ahead of register
allocation and adds a flag to control it (cmm-static-pred).
* It means the linear allocator is more likely to assign the hotter
code paths first.
* If it assign these first we are:
+ Less likely to spill on the hot path.
+ Less likely to introduce fixup blocks on the hot path.
These two measure combined are surprisingly effective. Based on nofib
we get in the mean:
* -0.9% instructions executed
* -0.1% reads/writes
* -0.2% code size.
* -0.1% compiler allocations.
* -0.9% compile time.
* -0.8% runtime.
Most of the benefits are simply a result of removing redundant moves
and spills.
Reduced compiler allocations likely are the result of less code being
generated. (The added lookup is mostly non-allocating).
Diffstat (limited to 'compiler/GHC/CmmToAsm/Reg/Linear.hs')
-rw-r--r-- | compiler/GHC/CmmToAsm/Reg/Linear.hs | 79 |
1 files changed, 61 insertions, 18 deletions
diff --git a/compiler/GHC/CmmToAsm/Reg/Linear.hs b/compiler/GHC/CmmToAsm/Reg/Linear.hs index 00b4915d7b..55cb73af1a 100644 --- a/compiler/GHC/CmmToAsm/Reg/Linear.hs +++ b/compiler/GHC/CmmToAsm/Reg/Linear.hs @@ -1,4 +1,5 @@ {-# LANGUAGE BangPatterns, CPP, ScopedTypeVariables #-} +{-# LANGUAGE ConstraintKinds #-} {-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-} @@ -137,6 +138,7 @@ import GHC.Platform import Data.Maybe import Data.List import Control.Monad +import Control.Applicative -- ----------------------------------------------------------------------------- -- Top level of the register allocator @@ -229,8 +231,13 @@ linearRegAlloc config entry_ids block_live sccs go f = linearRegAlloc' config f entry_ids block_live sccs platform = ncgPlatform config +-- | Constraints on the instruction instances used by the +-- linear allocator. +type OutputableRegConstraint freeRegs instr = + (FR freeRegs, Outputable freeRegs, Outputable instr, Instruction instr) + linearRegAlloc' - :: (FR freeRegs, Outputable instr, Instruction instr) + :: OutputableRegConstraint freeRegs instr => NCGConfig -> freeRegs -> [BlockId] -- ^ entry points @@ -246,7 +253,7 @@ linearRegAlloc' config initFreeRegs entry_ids block_live sccs return (blocks, stats, getStackUse stack) -linearRA_SCCs :: (FR freeRegs, Instruction instr, Outputable instr) +linearRA_SCCs :: OutputableRegConstraint freeRegs instr => [BlockId] -> BlockMap RegSet -> [NatBasicBlock instr] @@ -281,7 +288,7 @@ linearRA_SCCs entry_ids block_live blocksAcc (CyclicSCC blocks : sccs) more sanity checking to guard against this eventuality. -} -process :: (FR freeRegs, Instruction instr, Outputable instr) +process :: OutputableRegConstraint freeRegs instr => [BlockId] -> BlockMap RegSet -> [GenBasicBlock (LiveInstr instr)] @@ -325,15 +332,18 @@ process entry_ids block_live (b@(BasicBlock id _) : blocks) -- | Do register allocation on this basic block -- processBlock - :: (FR freeRegs, Outputable instr, Instruction instr) + :: OutputableRegConstraint freeRegs instr => BlockMap RegSet -- ^ live regs on entry to each basic block -> LiveBasicBlock instr -- ^ block to do register allocation on -> RegM freeRegs [NatBasicBlock instr] -- ^ block with registers allocated processBlock block_live (BasicBlock id instrs) - = do initBlock id block_live + = do -- pprTraceM "processBlock" $ text "" $$ ppr (BasicBlock id instrs) + initBlock id block_live + (instrs', fixups) <- linearRA block_live [] [] id instrs + -- pprTraceM "blockResult" $ ppr (instrs', fixups) return $ BasicBlock id instrs' : fixups @@ -369,7 +379,7 @@ initBlock id block_live -- | Do allocation for a sequence of instructions. linearRA - :: (FR freeRegs, Outputable instr, Instruction instr) + :: OutputableRegConstraint freeRegs instr => BlockMap RegSet -- ^ map of what vregs are live on entry to each block. -> [instr] -- ^ accumulator for instructions already processed. -> [NatBasicBlock instr] -- ^ accumulator for blocks of fixup code. @@ -396,7 +406,7 @@ linearRA block_live accInstr accFixups id (instr:instrs) -- | Do allocation for a single instruction. raInsn - :: (FR freeRegs, Outputable instr, Instruction instr) + :: OutputableRegConstraint freeRegs instr => BlockMap RegSet -- ^ map of what vregs are love on entry to each block. -> [instr] -- ^ accumulator for instructions already processed. -> BlockId -- ^ the id of the current block, for debugging @@ -476,7 +486,7 @@ isInReg src assig | Just (InReg _) <- lookupUFM assig src = True | otherwise = False -genRaInsn :: (FR freeRegs, Instruction instr, Outputable instr) +genRaInsn :: OutputableRegConstraint freeRegs instr => BlockMap RegSet -> [instr] -> BlockId @@ -486,6 +496,7 @@ genRaInsn :: (FR freeRegs, Instruction instr, Outputable instr) -> RegM freeRegs ([instr], [NatBasicBlock instr]) genRaInsn block_live new_instrs block_id instr r_dying w_dying = do +-- pprTraceM "genRaInsn" $ ppr (block_id, instr) platform <- getPlatform case regUsageOfInstr platform instr of { RU read written -> do @@ -525,6 +536,8 @@ genRaInsn block_live new_instrs block_id instr r_dying w_dying = do (fixup_blocks, adjusted_instr) <- joinToTargets block_live block_id instr +-- when (not $ null fixup_blocks) $ pprTraceM "genRA:FixBlocks" $ ppr fixup_blocks + -- Debugging - show places where the reg alloc inserted -- assignment fixup blocks. -- when (not $ null fixup_blocks) $ @@ -737,7 +750,7 @@ data SpillLoc = ReadMem StackSlot -- reading from register only in memory -- the list of free registers and free stack slots. allocateRegsAndSpill - :: (FR freeRegs, Outputable instr, Instruction instr) + :: forall freeRegs instr. (FR freeRegs, Outputable instr, Instruction instr) => Bool -- True <=> reading (load up spilled regs) -> [VirtualReg] -- don't push these out -> [instr] -- spill insns @@ -749,7 +762,8 @@ allocateRegsAndSpill _ _ spills alloc [] = return (spills, reverse alloc) allocateRegsAndSpill reading keep spills alloc (r:rs) - = do assig <- getAssigR + = do assig <- getAssigR :: RegM freeRegs (RegMap Loc) + -- pprTraceM "allocateRegsAndSpill:assig" (ppr (r:rs) $$ ppr assig) let doSpill = allocRegsAndSpill_spill reading keep spills alloc r rs assig case lookupUFM assig r of -- case (1a): already in a register @@ -779,6 +793,26 @@ allocateRegsAndSpill reading keep spills alloc (r:rs) | otherwise -> doSpill WriteNew +-- | Given a virtual reg find a preferred real register. +-- The preferred register is simply the first one the variable +-- was assigned to (if any). This way when we allocate for a loop +-- variables are likely to end up in the same registers at the +-- end and start of the loop, avoiding redundant reg-reg moves. +-- Note: I tried returning a list of past assignments, but that +-- turned out to barely matter but added a few tenths of +-- a percent to compile time. +findPrefRealReg :: forall freeRegs u. Uniquable u + => u -> RegM freeRegs (Maybe RealReg) +findPrefRealReg vreg = do + bassig <- getBlockAssigR :: RegM freeRegs (BlockMap (freeRegs,RegMap Loc)) + return $ foldr (findVirtRegAssig) Nothing bassig + where + findVirtRegAssig :: (freeRegs,RegMap Loc) -> Maybe RealReg -> Maybe RealReg + findVirtRegAssig assig z = + z <|> case lookupUFM (snd assig) vreg of + Just (InReg real_reg) -> Just real_reg + Just (InBoth real_reg _) -> Just real_reg + _ -> z -- reading is redundant with reason, but we keep it around because it's -- convenient and it maintains the recursive structure of the allocator. -- EZY @@ -795,18 +829,26 @@ allocRegsAndSpill_spill :: (FR freeRegs, Instruction instr, Outputable instr) allocRegsAndSpill_spill reading keep spills alloc r rs assig spill_loc = do platform <- getPlatform freeRegs <- getFreeRegsR - let freeRegs_thisClass = frGetFreeRegs platform (classOfVirtualReg r) freeRegs + let freeRegs_thisClass = frGetFreeRegs platform (classOfVirtualReg r) freeRegs :: [RealReg] - case freeRegs_thisClass of + -- Can we put the variable into a register it already was? + pref_reg <- findPrefRealReg r + case freeRegs_thisClass of -- case (2): we have a free register - (my_reg : _) -> - do spills' <- loadTemp r spill_loc my_reg spills + (first_free : _) -> + do let final_reg + | Just reg <- pref_reg + , reg `elem` freeRegs_thisClass + = reg + | otherwise + = first_free + spills' <- loadTemp r spill_loc final_reg spills - setAssigR (addToUFM assig r $! newLocation spill_loc my_reg) - setFreeRegsR $ frAllocateReg platform my_reg freeRegs + setAssigR (addToUFM assig r $! newLocation spill_loc final_reg) + setFreeRegsR $ frAllocateReg platform final_reg freeRegs - allocateRegsAndSpill reading keep spills' (my_reg : alloc) rs + allocateRegsAndSpill reading keep spills' (final_reg : alloc) rs -- case (3): we need to push something out to free up a register @@ -814,7 +856,8 @@ allocRegsAndSpill_spill reading keep spills alloc r rs assig spill_loc do let inRegOrBoth (InReg _) = True inRegOrBoth (InBoth _ _) = True inRegOrBoth _ = False - let candidates' = + let candidates' :: UniqFM Loc + candidates' = flip delListFromUFM keep $ filterUFM inRegOrBoth $ assig |