summaryrefslogtreecommitdiff
path: root/docs/users_guide/using-optimisation.rst
Commit message (Collapse)AuthorAgeFilesLines
* NCG: fast compilation of very large strings (#16190)Sylvain Henry2019-02-141-0/+16
| | | | | | | | | | This patch adds an optimization into the NCG: for large strings (threshold configurable via -fbinary-blob-threshold=NNN flag), instead of printing `.asciz "..."` in the generated ASM source, we print `.incbin "tmpXXX.dat"` and we dump the contents of the string into a temporary "tmpXXX.dat" file. See the note for more details.
* Fix broken links (#16125)Sven Tennie2019-01-051-6/+7
|
* Implement late lambda liftSebastian Graf2018-11-231-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This implements a selective lambda-lifting pass late in the STG pipeline. Lambda lifting has the effect of avoiding closure allocation at the cost of having to make former free vars available at call sites, possibly enlarging closures surrounding call sites in turn. We identify beneficial cases by means of an analysis that estimates closure growth. There's a Wiki page at https://ghc.haskell.org/trac/ghc/wiki/LateLamLift. Reviewers: simonpj, bgamari, simonmar Reviewed By: simonpj Subscribers: rwbarton, carter GHC Trac Issues: #9476 Differential Revision: https://phabricator.haskell.org/D5224
* NCG: New code layout algorithm.Andreas Klebinger2018-11-171-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
* vectorise: Put it out of its miseryBen Gamari2018-06-021-52/+0
| | | | | | | | | | | | | | | | | | | | | Poor DPH and its vectoriser have long been languishing; sadly it seems there is little chance that the effort will be rekindled. Every few years we discuss what to do with this mass of code and at least once we have agreed that it should be archived on a branch and removed from `master`. Here we do just that, eliminating heaps of dead code in the process. Here we drop the ParallelArrays extension, the vectoriser, and the `vector` and `primitive` submodules. Test Plan: Validate Reviewers: simonpj, simonmar, hvr, goldfire, alanz Reviewed By: simonmar Subscribers: goldfire, rwbarton, thomie, mpickering, carter Differential Revision: https://phabricator.haskell.org/D4761
* Make shortcutting at the asm stage toggleable and default for O2.Andreas Klebinger2018-04-131-0/+17
| | | | | | | | | | | | | | | | | | | Shortcutting during the asm stage of codegen is often redundant as most cases get caught during the Cmm passes. For example during compilation of all of nofib only 508 jumps are eleminated. For this reason I moved the pass from -O1 to -O2. I also made it toggleable with -fasm-shortcutting. Test Plan: ci Reviewers: bgamari Reviewed By: bgamari Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4555
* Fix syntax in -flate-specialise docsSimon Jakobi2018-03-291-0/+2
|
* Add -flate-specialise which runs a later specialisation passMatthew Pickering2018-03-191-0/+14
| | | | | | | | | | | | | | | | | | | | | Runs another specialisation pass towards the end of the optimisation pipeline. This can catch specialisation opportunities which arose from the previous specialisation pass or other inlining. You might want to use this if you are you have a type class method which returns a constrained type. For example, a type class where one of the methods implements a traversal. It is not enabled by default or any optimisation level. Only by manually enabling the flag `-flate-specialise`. Reviewers: bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4457
* Sort valid substitutions for typed holes by "relevance"Matthías Páll Gissurarson2018-01-261-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This is an initial attempt at tackling the issue of how to order the suggestions provided by the valid substitutions checker, by sorting them by creating a graph of how they subsume each other. We'd like to order them in such a manner that the most "relevant" suggestions are displayed first, so that the suggestion that the user might be looking for is displayed before more far-fetched suggestions (and thus also displayed when they'd otherwise be cut-off by the `-fmax-valid-substitutions` limit). The previous ordering was based on the order in which the elements appear in the list of imports, which I believe is less correlated with relevance than this ordering. A drawback of this approach is that, since we now want to sort the elements, we can no longer "bail out early" when we've hit the `-fmax-valid-substitutions` limit. Reviewers: bgamari, dfeuer Reviewed By: dfeuer Subscribers: dfeuer, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4326
* llvmGen: Pass vector arguments in vector registers by defaultBen Gamari2017-11-021-0/+12
| | | | | | | | | | | | | | | Earlier this year Edward Kmett requested [1] that we enable passing of vector values in vector registers by default. The GHC calling convention changes have been in LLVM for a number of years now so let's just flip the switch. [1] https://mail.haskell.org/pipermail/ghc-devs/2017-March/013905.html Reviewers: austin Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4142
* Implement a dedicated exitfication pass #14152Joachim Breitner2017-10-291-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is described in #14152, and can be summarized: Float the exit path out of a joinrec, so that the simplifier can do more with it. See the test case for a nice example. The floating goes against what the simplifier usually does, hence we need to be careful not inline them back. The position of exitification in the pipeline was chosen after a small amount of experimentation, but may need to be improved. For example, exitification can allow rewrite rules to fire, but for that it would have to happen before the `simpl_phases`. Perf.haskell.org reports these nice performance wins: Nofib allocations fannkuch-redux 78446640 - 99.92% 64560 k-nucleotide 109466384 - 91.32% 9502040 simple 72424696 - 5.96% 68109560 Nofib instruction counts fannkuch-redux 1744331636 - 3.86% 1676999519 k-nucleotide 2318221965 - 6.30% 2172067260 scs 1978470869 - 3.35% 1912263779 simple 669858104 - 3.38% 647206739 spectral-norm 186423292 - 5.37% 176411536 Differential Revision: https://phabricator.haskell.org/D3903
* user-guide: Clarify default optimization flagsBen Gamari2017-10-251-6/+7
| | | | | | | | | | | | | | | | Begins to fix #14214. [skip ci] Test Plan: Read it. Reviewers: austin Subscribers: rwbarton, thomie GHC Trac Issues: #14214 Differential Revision: https://phabricator.haskell.org/D4098
* users_guide: Convert mkUserGuidePart generation to a Sphinx extensionPatrick Dougherty2017-08-181-2/+283
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes all dependencies the users guide had on `mkUserGuidePart`. The generation of the flag reference table and the various pieces of the man page is now entirely contained within the Spinx extension `flags.py`. You can see the man page generation on the orphan page https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghc.html The extension works by collecting all of the meta-data attached to the `ghc-flag` directives and then formatting and displaying it at `flag-print` directives. There is a single printing directive that can be customized with two options, what format to display (table, list, or block of flags) and an optional category to limit the output to (verbosity, warnings, codegen, etc.). New display formats can be added by creating a function `generate_flag_xxx` (where `xxx` is a description of the format) which takes a list of flags and a category and returns a new `xxx`. Then just add a reference in the dispatch table `handlers`. That display can now be run by passing `:type: xxx` to the `flag-print` directive. `flags.py` contains two maps of settings that can be adjusted. The first is a canonical list of flag categories, and the second sets default categories for files. The only functionality that Sphinx could not replace was the `what_glasgow_exts_does.gen.rst` file. `mkUserGuidePart` actually just reads the list of flags from `compiler/main/DynFlags.hs` which Sphinx cannot do. As the flag is deprecated, I added the list as a static file which can be updated manually. Additionally, this patch updates every single documented flag with the data from `mkUserGuidePart` to generate the reference table. Fixes #11654 and, incidentally, #12155. Reviewers: austin, bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #11654, #12155 Differential Revision: https://phabricator.haskell.org/D3839
* users-guide: Standardize and repair all flag referencesPatrick Dougherty2017-07-231-26/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does three things: 1.) It simplifies the flag parsing code in `conf.py` to properly display flag definitions created by `.. (ghc|rts)-flag::`. Additionally, all flag references must include the associated arguments. Documentation has been added to `editing-guide.rst` to explain this. 2.) It normalizes all flag definitions to a similar format. Notably, all instances of `<>` have been replaced with `⟨⟩`. All references across the users guide have been updated to match. 3.) It fixes a couple issues with the flag reference table's generation code, which did not handle comma separated flags in the same cell and did not properly reference flags with arguments. Test Plan: `SPHINXOPTS = -n` to activate "nitpicky" mode, which reports all broken references. All remaining errors are references to flags without any documentation. Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #13980 Differential Revision: https://phabricator.haskell.org/D3778
* Fix links to SPJ’s papers (fixes #12578)Takenobu Tani2017-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This fixes #12578. Update links to SPJ's papers in following files: * compiler/coreSyn/CoreSyn.hs * docs/users_guide/using-optimisation.rst * docs/users_guide/parallel.rst * docs/users_guide/glasgow_exts.rst This commit is for ghc-8.2 branch. Test Plan: build Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #12578 Differential Revision: https://phabricator.haskell.org/D3745
* Correct optimization flags documentationSantiago Munin2017-06-081-5/+3
| | | | | | | | | | | | | | | | | | | | | In a previous change (commit 4fd6207ec6960c429e6a1bcbe0282f625010f52a), the users guide was moved from XML to the RST format. This process introduced a typo: "No -O*-type option specified:" was changed to "-O*" (which is not correct). This change fixes it. See result in: https://prnt.sc/fh332n Fixes ticket #13756. Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #13756 Differential Revision: https://phabricator.haskell.org/D3631
* Typos in comments and manual [ci skip]Gabor Greif2017-05-231-2/+2
|
* users-guide: Fix a variety of warningsBen Gamari2017-05-081-2/+1
| | | | Including #13665.
* Show valid substitutions for typed holesMatthías Páll Gissurarson2017-03-291-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is to implement a mechanism similar to PureScript, where they suggest which identifiers in scope would fit the given hole. In PureScript, they use subsumption (which is what we would like here as well). For subsumption, we would have to check each type in scope whether the hole is a subtype of the given type, but that would require `tcSubType` and constraint satisfiability checking. Currently, `TcSimplify` uses a lot of functions from `TcErrors`, so that would require more of a rewrite, I will hold on with that for now, and submit the more simpler type equality version. As an example, consider ``` ps :: String -> IO () ps = putStrLn ps2 :: a -> IO () ps2 _ = putStrLn "hello, world" main :: IO () main = _ "hello, world" ``` The results would be something like ``` • Found hole: _ :: [Char] -> IO () • In the expression: _ In a stmt of a 'do' block: _ "hello, world" In the expression: do _ "hello, world" • Relevant bindings include main :: IO () (bound at test.hs:13:1) ps :: String -> IO () (bound at test.hs:7:1) ps2 :: forall a. a -> IO () (bound at test.hs:10:1) Valid substitutions include putStrLn :: String -> IO () (imported from ‘Prelude’ at test.hs:1:1-14 (and originally defined in ‘System.IO’)) putStr :: String -> IO () (imported from ‘Prelude’ at test.hs:1:1-14 (and originally defined in ‘System.IO’)) ``` We'd like here for ps2 to be suggested as well, but for that we require subsumption. Reviewers: austin, bgamari, dfeuer, mpickering Reviewed By: dfeuer, mpickering Subscribers: mpickering, Wizek, dfeuer, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3361
* -fspec-constr-keen docs typos [skip ci]Matthew Pickering2017-03-271-2/+2
|
* Update link to paper about demand analyser in user guideMatthew Pickering2017-03-191-5/+2
| | | | | | | | Reviewers: austin, bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3354
* Add -fspec-constr-keenSimon Peyton Jones2017-02-261-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | I discovered that the dramatic imprvoement in perf/should_run/T9339 with the introduction of join points was really rather a fluke, and very fragile. The real problem (see Note [Making SpecConstr keener]) is that SpecConstr wasn't specialising a function even though it was applied to a freshly-allocated constructor. The paper describes plausible reasons for this, but I think it may well be better to be a bit more aggressive. So this patch add -fspec-constr-keen, which makes SpecConstr a bit keener to specialise, by ignoring whether or not the argument corresponding to a call pattern is scrutinised in the function body. Now the gains in T9339 should be robust; and it might even be a better default. I'd be interested in what happens if we switched on -fspec-constr-keen with -O2. Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3186
* users-guide: Document defaults for remaining optimization flagsBen Gamari2017-02-081-10/+42
|
* Use top-level instances to solve superclasses where possibleDaniel Haraj2017-01-311-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new flag `-fsolve-constant-dicts` which makes the constraint solver solve super class constraints with available dictionaries if possible. The flag is enabled by `-O1`. The motivation of this patch is that the compiler can produce more efficient code if the constraint solver used top-level instance declarations to solve constraints that are currently solved givens and their superclasses. In particular, as it currently stands, the compiler imposes a performance penalty on the common use-case where superclasses are bundled together for user convenience. The performance penalty applies to constraint synonyms as well. This example illustrates the issue: ``` {-# LANGUAGE ConstraintKinds, MultiParamTypeClasses, FlexibleContexts #-} module B where class M a b where m :: a -> b type C a b = (Num a, M a b) f :: C Int b => b -> Int -> Int f _ x = x + 1 ``` Output without the patch, notice that we get the instance for `Num Int` by using the class selector `p1`. ``` f :: forall b_arz. C Int b_arz => b_arz -> Int -> Int f = \ (@ b_a1EB) ($d(%,%)_a1EC :: C Int b_a1EB) _ (eta1_B1 :: Int) -> + @ Int (GHC.Classes.$p1(%,%) @ (Num Int) @ (M Int b_a1EB) $d(%,%)_a1EC) eta1_B1 B.f1 ``` Output with the patch, nicely optimised code! ``` f :: forall b. C Int b => b -> Int -> Int f = \ (@ b) _ _ (x_azg :: Int) -> case x_azg of { GHC.Types.I# x1_a1DP -> GHC.Types.I# (GHC.Prim.+# x1_a1DP 1#) } ``` Reviewers: simonpj, bgamari, austin Reviewed By: simonpj Subscribers: mpickering, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D2714 GHC Trac Issues: #12791, #5835
* Document -fspecialise-aggressivelyMatthew Pickering2017-01-231-0/+12
| | | | | | | | | | | | Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3007 GHC Trac Issues: #12979
* Add a CSE pass to Stg (#9291)Joachim Breitner2017-01-051-0/+8
| | | | | | | | | | | This CSE pass only targets data constructor applications. This is probably the best we can do, as function calls and primitive operations might have side-effects. Introduces the flag -fstg-cse, enabled by default with -O for now. It might also be a good candiate for -O2. Differential Revision: https://phabricator.haskell.org/D2871
* Scrutinee Constant FoldingSylvain Henry2016-12-091-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces new rules to perform constant folding through case-expressions. E.g., ``` case t -# 10# of _ { ===> case t of _ { 5# -> e1 15# -> e1 8# -> e2 18# -> e2 DEFAULT -> e DEFAULT -> e ``` The initial motivation is that it allows "Merge Nested Cases" optimization to kick in and to further simplify the code (see Trac #12877). Currently we recognize the following operations for Word# and Int#: Add, Sub, Xor, Not and Negate (for Int# only). Test Plan: validate Reviewers: simonpj, austin, bgamari Reviewed By: simonpj, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2762 GHC Trac Issues: #12877
* nativeGen: Allow -fregs-graph to be usedBen Gamari2016-06-301-11/+17
| | | | | | | | | | | | | | | | Previously the flag was silently ignored due the #7679 and #8657. This, however, seems unnecessarily brutal and makes experimentation unduly difficult for users. Test Plan: Validate Reviewers: austin, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2335 GHC Trac Issues: #7679, #8657
* Add flag to control number of missing patterns in warningsDavid Luposchainsky2016-04-171-1/+8
| | | | | | | | | | | | | | | | | | Non-exhaustive pattern warnings had their number of patterns to show hardcoded in the past. This patch implements the TODO remark that this should be made a command line flag. -fmax-uncovered-patterns=<n> can now be used to influence the number of patterns to be shown. Reviewers: hvr, austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2076
* Do not claim that -O2 does not do better than -OJoachim Breitner2016-03-301-3/+0
| | | | | | | | | | | | when in fact it does. This was pointed out by Johannes Bechberger and supported with seemingly statistically sound evidence in his Bachelor thesis: Of the benchmark shootout programs, 80% benefit significantly by switchtng from -O to -O2. See https://uqudy.serpens.uberspace.de/blog/2016/02/08/ghc-performance-over-time/ for a few raw numbers. Differential Revision: https://phabricator.haskell.org/D2065
* users_guide: Use semantic directive/role for command line optionsBen Gamari2016-01-091-247/+202
| | | | | | And GHCi commands. This makes cross-referencing much easier. Also normalize markup a bit and add some missing flags.
* Move user's guide to ReStructuredTextBen Gamari2015-10-031-0/+780