summaryrefslogtreecommitdiff
path: root/compiler/nativeGen/RegAlloc/Linear/Main.hs
diff options
context:
space:
mode:
authorAndreas Klebinger <klebinger.andreas@gmx.at>2018-11-17 11:20:36 +0100
committerAndreas Klebinger <klebinger.andreas@gmx.at>2018-11-17 11:20:36 +0100
commit912fd2b6ca0bc51076835b6e3d1f469b715e2760 (patch)
treeae1c96217e0eea77d0bfd53101d3fa868d45027d /compiler/nativeGen/RegAlloc/Linear/Main.hs
parent6ba9aa5dd0a539adf02690a9c71d1589f541b3c5 (diff)
downloadhaskell-912fd2b6ca0bc51076835b6e3d1f469b715e2760.tar.gz
NCG: New code layout algorithm.
Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
Diffstat (limited to 'compiler/nativeGen/RegAlloc/Linear/Main.hs')
-rw-r--r--compiler/nativeGen/RegAlloc/Linear/Main.hs8
1 files changed, 7 insertions, 1 deletions
diff --git a/compiler/nativeGen/RegAlloc/Linear/Main.hs b/compiler/nativeGen/RegAlloc/Linear/Main.hs
index 6171d8d20d..bcac084c5f 100644
--- a/compiler/nativeGen/RegAlloc/Linear/Main.hs
+++ b/compiler/nativeGen/RegAlloc/Linear/Main.hs
@@ -147,7 +147,8 @@ regAlloc
-> UniqSM ( NatCmmDecl statics instr
, Maybe Int -- number of extra stack slots required,
-- beyond maxSpillSlots
- , Maybe RegAllocStats)
+ , Maybe RegAllocStats
+ )
regAlloc _ (CmmData sec d)
= return
@@ -523,6 +524,11 @@ genRaInsn block_live new_instrs block_id instr r_dying w_dying = do
(fixup_blocks, adjusted_instr)
<- joinToTargets block_live block_id instr
+ -- Debugging - show places where the reg alloc inserted
+ -- assignment fixup blocks.
+ -- when (not $ null fixup_blocks) $
+ -- pprTrace "fixup_blocks" (ppr fixup_blocks) (return ())
+
-- (e) Delete all register assignments for temps which are read
-- (only) and die here. Update the free register list.
releaseRegs r_dying