summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAndreas Klebinger <klebinger.andreas@gmx.at>2020-04-15 21:52:13 +0200
committerBen Gamari <ben@smart-cactus.org>2021-04-07 12:00:39 -0400
commit426b78b8d0cce951372e27027c530ea1832fa7c6 (patch)
treee62f09174813367d097f9ede8903903f48eeb478
parent7b894e6b820d2ff0be033e0edff55d1b70f78060 (diff)
downloadhaskell-426b78b8d0cce951372e27027c530ea1832fa7c6.tar.gz
NCG: Codelayout: Distinguish conditional and other branches.
In #18053 we ended up with a suboptimal code layout because the code layout algorithm didn't distinguish between conditional and unconditional control flow. We can completely eliminate unconditional control flow instructions by placing blocks next to each other, not so much for conditionals. In terms of implementation we simply give conditional branches less weight before computing the layout. Fixes #18053 (cherry picked from commit edc2cc588add3f23b3650f15d3f495943f2c06f9)
-rw-r--r--compiler/nativeGen/BlockLayout.hs57
1 files changed, 54 insertions, 3 deletions
diff --git a/compiler/nativeGen/BlockLayout.hs b/compiler/nativeGen/BlockLayout.hs
index 56e3177dd8..f5faba2d2e 100644
--- a/compiler/nativeGen/BlockLayout.hs
+++ b/compiler/nativeGen/BlockLayout.hs
@@ -239,7 +239,44 @@ import Control.Monad (foldM)
Assuming that Lwork is large the chance that the "call" ends up
in the same cache line is also fairly small.
--}
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ ~~~ Note [Layout relevant edge weights]
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ The input to the chain based code layout algorithm is a CFG
+ with edges annotated with their frequency. The frequency
+ of traversal corresponds quite well to the cost of not placing
+ the connected blocks next to each other.
+
+ However even if having the same frequency certain edges are
+ inherently more or less relevant to code layout.
+
+ In particular:
+
+ * Edges which cross an info table are less relevant than others.
+
+ If we place the blocks across this edge next to each other
+ they are still separated by the info table which negates
+ much of the benefit. It makes it less likely both blocks
+ will share a cache line reducing the benefits from locality.
+ But it also prevents us from eliminating jump instructions.
+
+ * Conditional branches and switches are slightly less relevant.
+
+ We can completely remove unconditional jumps by placing them
+ next to each other. This is not true for conditional branch edges.
+ We apply a small modifier to them to ensure edges for which we can
+ eliminate the overhead completely are considered first. See also #18053.
+
+ * Edges constituted by a call are ignored.
+
+ Considering these hardly helped with performance and ignoring
+ them helps quite a bit to improve compiler performance.
+
+ So we perform a preprocessing step where we apply a multiplicator
+ to these kinds of edges.
+
+ -}
-- | Look at X number of blocks in two chains to determine
@@ -653,17 +690,31 @@ sequenceChain info weights' blocks@((BasicBlock entry _):_) =
directEdges :: [CfgEdge]
directEdges = sortBy (flip compare) $ catMaybes . map relevantWeight $ (infoEdgeList weights)
where
+ -- Apply modifiers to turn edge frequencies into useable weights
+ -- for computing code layout.
+ -- See also Note [Layout relevant edge weights]
relevantWeight :: CfgEdge -> Maybe CfgEdge
relevantWeight edge@(CfgEdge from to edgeInfo)
| (EdgeInfo CmmSource { trans_cmmNode = CmmCall {} } _) <- edgeInfo
- -- Ignore edges across calls
+ -- Ignore edges across calls.
= Nothing
| mapMember to info
, w <- edgeWeight edgeInfo
- -- The payoff is small if we jump over an info table
+ -- The payoff is quite small if we jump over an info table
= Just (CfgEdge from to edgeInfo { edgeWeight = w/8 })
+ | (EdgeInfo CmmSource { trans_cmmNode = exitNode } _) <- edgeInfo
+ , cantEliminate exitNode
+ , w <- edgeWeight edgeInfo
+ -- A small penalty to edge types which
+ -- we can't optimize away by layout.
+ -- w * 0.96875 == w - w/32
+ = Just (CfgEdge from to edgeInfo { edgeWeight = w * 0.96875 })
| otherwise
= Just edge
+ where
+ cantEliminate CmmCondBranch {} = True
+ cantEliminate CmmSwitch {} = True
+ cantEliminate _ = False
blockMap :: LabelMap (GenBasicBlock i)
blockMap