summaryrefslogtreecommitdiff
path: root/compiler/cmm/CmmImplementSwitchPlans.hs
Commit message (Collapse)AuthorAgeFilesLines
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-116/+0
|
* Fix random typos [skip ci]nineonine2019-11-171-1/+1
|
* Add two CmmSwitch optimizations.Andreas Klebinger2019-07-131-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move switch expressions into a local variable when generating switches. This avoids duplicating the expression if we translate the switch to a tree search. This fixes #16933. Further we now check if all branches of a switch have the same destination, replacing the switch with a direct branch if that is the case. Both of these patterns appear in the ENTER macro used by the RTS but are unlikely to occur in intermediate Cmm generated by GHC. Nofib result summary: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- Min -0.0% -0.0% -15.7% -15.6% 0.0% Max -0.0% 0.0% +5.4% +5.5% 0.0% Geometric Mean -0.0% -0.0% -1.0% -1.0% -0.0% Compiler allocations go up slightly: +0.2% Example output before and after the change taken from RTS code below. All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated. Instead the data is loaded once from memory in block c6. Also the switch in block `ud` in the original code has been eliminated completely. Cmm without this commit: ``` stg_ap_0_fast() { // [R1] { [] } {offset ca: _c1::P64 = R1; // CmmAssign goto c2; // CmmBranch c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6; c6: _c3::I64 = I64[_c1::P64]; if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug; ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue; uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud; ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) { case 8, 9, 10, 11, 12, 13, 14 : goto c4; } ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf; uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4; c4: R1 = _c1::P64; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui; uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8; ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7; c8: _c1::P64 = P64[_c1::P64 + 8]; goto c2; c7: R1 = _c1::P64; call (_c3::I64)(R1) args: 8, res: 0, upd: 8; } } ``` Cmm with this commit: ``` stg_ap_0_fast() { // [R1] { [] } {offset ca: _c1::P64 = R1; goto c2; c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6; c6: _c3::I64 = I64[_c1::P64]; _ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]); if (_ub::I64 < 26) goto uc; else goto uh; uc: if (_ub::I64 < 15) goto ud; else goto uf; ud: if (_ub::I64 < 8) goto c7; else goto c4; uf: if (_ub::I64 >= 25) goto c4; else goto ug; ug: if (_ub::I64 != 23) goto c7; else goto c4; c4: R1 = _c1::P64; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; uh: if (_ub::I64 < 28) goto ui; else goto uj; ui: if (_ub::I64 < 27) goto c7; else goto c8; uj: if (_ub::I64 < 29) goto c8; else goto c7; c8: _c1::P64 = P64[_c1::P64 + 8]; goto c2; c7: R1 = _c1::P64; call (_c3::I64)(R1) args: 8, res: 0, upd: 8; } } ```
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-0/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* Hoopl: remove dependency on Hoopl packageMichal Terepeta2017-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
* Spelling in comments only [ci skip]Gabor Greif2017-03-281-1/+1
|
* Annotate CmmBranch with an optional likely targetSimon Marlow2015-09-231-2/+2
| | | | | | | | | | | | | | | | | Summary: This allows the code generator to give hints to later code generation steps about which branch is most likely to be taken. Right now it is only taken into account in one place: a special case in CmmContFlowOpt that swapped branches over to maximise the chance of fallthrough, which is now disabled when there is a likelihood setting. Test Plan: validate Reviewers: austin, simonpj, bgamari, ezyang, tibbe Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1273
* Refactor the story around switches (#10137)Joachim Breitner2015-03-301-0/+90
This re-implements the code generation for case expressions at the Stg → Cmm level, both for data type cases as well as for integral literal cases. (Cases on float are still treated as before). The goal is to allow for fancier strategies in implementing them, for a cleaner separation of the strategy from the gritty details of Cmm, and to run this later than the Common Block Optimization, allowing for one way to attack #10124. The new module CmmSwitch contains a number of notes explaining this changes. For example, it creates larger consecutive jump tables than the previous code, if possible. nofib shows little significant overall improvement of runtime. The rather large wobbling comes from changes in the code block order (see #8082, not much we can do about it). But the decrease in code size alone makes this worthwhile. ``` Program Size Allocs Runtime Elapsed TotalMem Min -1.8% 0.0% -6.1% -6.1% -2.9% Max -0.7% +0.0% +5.6% +5.7% +7.8% Geometric Mean -1.4% -0.0% -0.3% -0.3% +0.0% ``` Compilation time increases slightly: ``` -1 s.d. ----- -2.0% +1 s.d. ----- +2.5% Average ----- +0.3% ``` The test case T783 regresses a lot, but it is the only one exhibiting any regression. The cause is the changed order of branches in an if-then-else tree, which makes the hoople data flow analysis traverse the blocks in a suboptimal order. Reverting that gets rid of this regression, but has a consistent, if only very small (+0.2%), negative effect on runtime. So I conclude that this test is an extreme outlier and no reason to change the code. Differential Revision: https://phabricator.haskell.org/D720