Refactor linear reg alloc to remember past assignments.

When assigning registers we now first try registers we assigned to in the past, instead of picking the "first" one. This is in extremely helpful when dealing with loops for which variables are dead for part of the loop. This is important for patterns like this: foo = arg1 loop: use(foo) ... foo = getVal() goto loop; There we: * assign foo to the register of arg1. * use foo, it's dead after this use as it's overwritten after. * do other things. * look for a register to put foo in. If we pick an arbitrary one it might differ from the register the start of the loop expect's foo to be in. To fix this we simply look for past register assignments for the given variable. If we find one and the register is free we use that register. This reduces the need for fixup blocks which match the register assignment between blocks. In the example above between the end and the head of the loop. This patch also moves branch weight estimation ahead of register allocation and adds a flag to control it (cmm-static-pred). * It means the linear allocator is more likely to assign the hotter code paths first. * If it assign these first we are: + Less likely to spill on the hot path. + Less likely to introduce fixup blocks on the hot path. These two measure combined are surprisingly effective. Based on nofib we get in the mean: * -0.9% instructions executed * -0.1% reads/writes * -0.2% code size. * -0.1% compiler allocations. * -0.9% compile time. * -0.8% runtime. Most of the benefits are simply a result of removing redundant moves and spills. Reduced compiler allocations likely are the result of less code being generated. (The added lookup is mostly non-allocating).
author: Andreas Klebinger <klebinger.andreas@gmx.at> 2020-04-04 02:52:12 +0200
committer: Marge Bot <ben+marge-bot@smart-cactus.org> 2020-05-21 12:13:45 -0400
commit: 13f6c9d0376214b22d4cd16bd3a8cd7b8d864990 (patch)
tree: 58c8a3f7fa56f89112f3cd3182c8aa50eaeece5a /docs
parent: 78c6523c5106fc56b653fc14fda5741913da8fdc (diff)
download: haskell-13f6c9d0376214b22d4cd16bd3a8cd7b8d864990.tar.gz
2 files changed, 24 insertions, 4 deletions
diff --git a/docs/users_guide/8.12.1-notes.rst b/docs/users_guide/8.12.1-notes.rst
index 46a729af70..9fabc47310 100644
--- a/docs/users_guide/8.12.1-notes.rst
+++ b/docs/users_guide/8.12.1-notes.rst
@@ -10,7 +10,14 @@ following sections.
 Highlights
 ----------
 
-- TODO
+* NCG
+
+  - The linear register allocator saw improvements reducing the number
+    of redundant move instructions. Rare edge cases can see double
+    digit improvements in runtime for inner loops.
+
+    In the mean this improved runtime by about 0.8%. For details
+    see ticket #17823.
 
 Full details
 ------------
@@ -155,11 +162,11 @@ Arrow notation
    ``hsGroupTopLevelFixitySigs`` function, which collects all top-level fixity
    signatures, including those for class methods defined inside classes.
 
-- The ``Exception`` module was boiled down acknowledging the existence of 
+- The ``Exception`` module was boiled down acknowledging the existence of
   the ``exceptions`` dependency. In particular, the ``ExceptionMonad``
   class is not a proper class anymore, but a mere synonym for ``MonadThrow``,
-  ``MonadCatch``, ``MonadMask`` (all from ``exceptions``) and ``MonadIO``. 
-  All of ``g*``-functions from the module (``gtry``, ``gcatch``, etc.) are 
+  ``MonadCatch``, ``MonadMask`` (all from ``exceptions``) and ``MonadIO``.
+  All of ``g*``-functions from the module (``gtry``, ``gcatch``, etc.) are
   erased, and their ``exceptions``-alternatives are meant to be used in the
   GHC code instead.
 
diff --git a/docs/users_guide/using-optimisation.rst b/docs/users_guide/using-optimisation.rst
index 4ca47524a7..8ec19cb147 100644
--- a/docs/users_guide/using-optimisation.rst
+++ b/docs/users_guide/using-optimisation.rst
@@ -214,6 +214,19 @@ by saying ``-fno-wombat``.
     to their usage sites. It also inlines simple expressions like
     literals or registers.
 
+.. ghc-flag:: -fcmm-static-pred
+    :shortdesc: Enable static control flow prediction. Implied by :ghc-flag:`-O`.
+    :type: dynamic
+    :reverse: -fno-cmm-static-pred
+    :category:
+
+    :default: off but enabled with :ghc-flag:`-O`.
+
+    This enables static control flow prediction on the final Cmm
+    code. If enabled GHC will apply certain heuristics to identify
+    loops and hot code paths. This information is then used by the
+    register allocation and code layout passes.
+
 .. ghc-flag:: -fasm-shortcutting
     :shortdesc: Enable shortcutting on assembly. Implied by :ghc-flag:`-O2`.
     :type: dynamic
author	Andreas Klebinger <klebinger.andreas@gmx.at>	2020-04-04 02:52:12 +0200
committer	Marge Bot <ben+marge-bot@smart-cactus.org>	2020-05-21 12:13:45 -0400
commit	13f6c9d0376214b22d4cd16bd3a8cd7b8d864990 (patch)
tree	58c8a3f7fa56f89112f3cd3182c8aa50eaeece5a /docs
parent	78c6523c5106fc56b653fc14fda5741913da8fdc (diff)
download	haskell-13f6c9d0376214b22d4cd16bd3a8cd7b8d864990.tar.gz