diff options
author | Andreas Klebinger <klebinger.andreas@gmx.at> | 2018-11-17 11:20:36 +0100 |
---|---|---|
committer | Andreas Klebinger <klebinger.andreas@gmx.at> | 2018-11-17 11:20:36 +0100 |
commit | 912fd2b6ca0bc51076835b6e3d1f469b715e2760 (patch) | |
tree | ae1c96217e0eea77d0bfd53101d3fa868d45027d /compiler/nativeGen/PPC/Instr.hs | |
parent | 6ba9aa5dd0a539adf02690a9c71d1589f541b3c5 (diff) | |
download | haskell-912fd2b6ca0bc51076835b6e3d1f469b715e2760.tar.gz |
NCG: New code layout algorithm.
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
Diffstat (limited to 'compiler/nativeGen/PPC/Instr.hs')
-rw-r--r-- | compiler/nativeGen/PPC/Instr.hs | 10 |
1 files changed, 6 insertions, 4 deletions
diff --git a/compiler/nativeGen/PPC/Instr.hs b/compiler/nativeGen/PPC/Instr.hs index 8eb5e8fa8d..ade39430c0 100644 --- a/compiler/nativeGen/PPC/Instr.hs +++ b/compiler/nativeGen/PPC/Instr.hs @@ -100,9 +100,9 @@ allocMoreStack :: Platform -> Int -> NatCmmDecl statics PPC.Instr.Instr - -> UniqSM (NatCmmDecl statics PPC.Instr.Instr) + -> UniqSM (NatCmmDecl statics PPC.Instr.Instr, [(BlockId,BlockId)]) -allocMoreStack _ _ top@(CmmData _ _) = return top +allocMoreStack _ _ top@(CmmData _ _) = return (top,[]) allocMoreStack platform slots (CmmProc info lbl live (ListGraph code)) = do let infos = mapKeys info @@ -121,8 +121,10 @@ allocMoreStack platform slots (CmmProc info lbl live (ListGraph code)) = do alloc = mkStackAllocInstr platform delta dealloc = mkStackDeallocInstr platform delta + retargetList = (zip entries (map mkBlockId uniqs)) + new_blockmap :: LabelMap BlockId - new_blockmap = mapFromList (zip entries (map mkBlockId uniqs)) + new_blockmap = mapFromList retargetList insert_stack_insns (BasicBlock id insns) | Just new_blockid <- mapLookup id new_blockmap @@ -156,7 +158,7 @@ allocMoreStack platform slots (CmmProc info lbl live (ListGraph code)) = do = concatMap insert_stack_insns code -- in - return (CmmProc info lbl live (ListGraph new_code)) + return (CmmProc info lbl live (ListGraph new_code),retargetList) -- ----------------------------------------------------------------------------- |