summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorAndreas Klebinger <klebinger.andreas@gmx.at>2018-11-17 11:20:36 +0100
committerAndreas Klebinger <klebinger.andreas@gmx.at>2018-11-17 11:20:36 +0100
commit912fd2b6ca0bc51076835b6e3d1f469b715e2760 (patch)
treeae1c96217e0eea77d0bfd53101d3fa868d45027d /docs
parent6ba9aa5dd0a539adf02690a9c71d1589f541b3c5 (diff)
downloadhaskell-912fd2b6ca0bc51076835b6e3d1f469b715e2760.tar.gz
NCG: New code layout algorithm.
Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
Diffstat (limited to 'docs')
-rw-r--r--docs/users_guide/8.8.1-notes.rst4
-rw-r--r--docs/users_guide/debugging.rst6
-rw-r--r--docs/users_guide/using-optimisation.rst54
3 files changed, 63 insertions, 1 deletions
diff --git a/docs/users_guide/8.8.1-notes.rst b/docs/users_guide/8.8.1-notes.rst
index 252db777bc..66ecdf0015 100644
--- a/docs/users_guide/8.8.1-notes.rst
+++ b/docs/users_guide/8.8.1-notes.rst
@@ -14,6 +14,7 @@ Highlights
The highlights, since the 8.6.1 release, are:
- Many, many bug fixes.
+- A new code layout algorithm for x86.
Full details
@@ -73,6 +74,9 @@ Compiler
- The :ghc-flag:`-fllvm-pass-vectors-in-regs` flag is now deprecated as vector
arguments are now passed in registers by default.
+- The :ghc-flag:`-fblock-layout-cfg` flag enables a new code layout algorithm on x86.
+ This is enabled by default at :ghc-flag:`-O` and :ghc-flag:`-O2`.
+
Runtime system
~~~~~~~~~~~~~~
diff --git a/docs/users_guide/debugging.rst b/docs/users_guide/debugging.rst
index 6a4c7fe1a7..1ffdf21eba 100644
--- a/docs/users_guide/debugging.rst
+++ b/docs/users_guide/debugging.rst
@@ -475,7 +475,13 @@ These flags dump various phases of GHC's C-\\- pipeline.
Dump the result of the C-\\- pipeline processing
+.. ghc-flag:: -ddump-cfg-weights
+ :shortdesc: Dump the assumed weights of the CFG.
+ :type: dynamic
+ Dumps the CFG with weights used by the new block layout code.
+ Each CFG is dumped in dot format graph making it easy
+ to visualize them.
LLVM code generator
~~~~~~~~~~~~~~~~~~~~~~
diff --git a/docs/users_guide/using-optimisation.rst b/docs/users_guide/using-optimisation.rst
index da066e158c..0048478683 100644
--- a/docs/users_guide/using-optimisation.rst
+++ b/docs/users_guide/using-optimisation.rst
@@ -45,7 +45,7 @@ optimisation to be performed, which can have an impact on how much of
your program needs to be recompiled when you change something. This is
one reason to stick to no-optimisation when developing code.
-**No ``-O*``-type option specified:** This is taken to mean “Please
+**No ``-O*``-type option specified:** This is taken to mean “Please
compile quickly; I'm not over-bothered about compiled-code quality.”
So, for example, ``ghc -c Foo.hs``
@@ -219,6 +219,58 @@ by saying ``-fno-wombat``.
This is mostly done during Cmm passes. However this can miss corner cases. So at -O2
we run the pass again at the asm stage to catch these.
+.. ghc-flag:: -fblock-layout-cfg
+ :shortdesc: Use the new cfg based block layout algorithm.
+ :type: dynamic
+ :reverse: -fno-block-layout-cfg
+ :category:
+
+ :default: off but enabled with :ghc-flag:`-O`.
+
+ The new algorithm considers all outgoing edges of a basic blocks for
+ code layout instead of only the last jump instruction.
+ It also builds a control flow graph for functions, tries to find
+ hot code paths and place them sequentially leading to better cache utilization
+ and performance.
+
+ This is expected to improve performance on average, but actual performance
+ difference can vary.
+
+ If you find cases of significant performance regressions, which can
+ be traced back to obviously bad code layout please open a ticket.
+
+.. ghc-flag:: -fblock-layout-weights
+ :shortdesc: Sets edge weights used by the new code layout algorithm.
+ :type: dynamic
+ :category:
+
+ This flag is hacker territory. The main purpose of this flag is to make
+ it easy to debug and tune the new code layout algorithm. There is no
+ guarantee that values giving better results now won't be worse with
+ the next release.
+
+ If you feel your code warrants modifying these settings please consult
+ the source code for default values and documentation. But I strongly
+ advise against this.
+
+.. ghc-flag:: -fblock-layout-weightless
+ :shortdesc: Ignore cfg weights for code layout.
+ :type: dynamic
+ :reverse: -fno-block-layout-weightless
+ :category:
+
+ :default: off
+
+ When not using the cfg based blocklayout layout is determined either
+ by the last jump in a basic block or the heaviest outgoing edge of the
+ block in the cfg.
+
+ With this flag enabled we use the last jump instruction in blocks.
+ Without this flags the old algorithm also uses the heaviest outgoing
+ edge.
+
+ When this flag is enabled and :ghc-flag:`-fblock-layout-cfg` is disabled
+ block layout behaves the same as in 8.6 and earlier.
.. ghc-flag:: -fcpr-anal
:shortdesc: Turn on CPR analysis in the demand analyser. Implied by :ghc-flag:`-O`.