| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
-fno-max-relevant-binds
|
| |
|
| |
|
|
|
|
|
| |
-O2 is the highest value of optimization.
-O3 will be reverted to -O2.
|
|
|
|
|
|
|
|
|
|
| |
The user's guide uses the `ghc-wiki` macro, and substitution rules
are complicated. So I manually edited `.rst` files without sed.
I changed `Commentary/Latedmd` only to a different page.
It is more appropriate as an example.
[ci skip]
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
|
|
|
|
|
| |
This patch adds an optimization into the NCG: for large strings
(threshold configurable via -fbinary-blob-threshold=NNN flag), instead
of printing `.asciz "..."` in the generated ASM source, we print
`.incbin "tmpXXX.dat"` and we dump the contents of the string into a
temporary "tmpXXX.dat" file.
See the note for more details.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This implements a selective lambda-lifting pass late in the STG
pipeline.
Lambda lifting has the effect of avoiding closure allocation at the cost
of having to make former free vars available at call sites, possibly
enlarging closures surrounding call sites in turn.
We identify beneficial cases by means of an analysis that estimates
closure growth.
There's a Wiki page at
https://ghc.haskell.org/trac/ghc/wiki/LateLamLift.
Reviewers: simonpj, bgamari, simonmar
Reviewed By: simonpj
Subscribers: rwbarton, carter
GHC Trac Issues: #9476
Differential Revision: https://phabricator.haskell.org/D5224
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Poor DPH and its vectoriser have long been languishing; sadly it seems there is
little chance that the effort will be rekindled. Every few years we discuss
what to do with this mass of code and at least once we have agreed that it
should be archived on a branch and removed from `master`. Here we do just that,
eliminating heaps of dead code in the process.
Here we drop the ParallelArrays extension, the vectoriser, and the `vector` and
`primitive` submodules.
Test Plan: Validate
Reviewers: simonpj, simonmar, hvr, goldfire, alanz
Reviewed By: simonmar
Subscribers: goldfire, rwbarton, thomie, mpickering, carter
Differential Revision: https://phabricator.haskell.org/D4761
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shortcutting during the asm stage of codegen is often redundant as most
cases get caught during the Cmm passes. For example during compilation
of all of nofib only 508 jumps are eleminated.
For this reason I moved the pass from -O1 to -O2. I also made it
toggleable with -fasm-shortcutting.
Test Plan: ci
Reviewers: bgamari
Reviewed By: bgamari
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4555
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Runs another specialisation pass towards the end of the optimisation
pipeline. This can catch specialisation opportunities which arose from
the previous specialisation pass or other inlining.
You might want to use this if you are you have a type class method
which returns a constrained type. For example, a type class where one
of the methods implements a traversal.
It is not enabled by default or any optimisation level. Only by
manually enabling the flag `-flate-specialise`.
Reviewers: bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4457
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an initial attempt at tackling the issue of how to order the
suggestions provided by the valid substitutions checker, by sorting
them by creating a graph of how they subsume each other. We'd like to
order them in such a manner that the most "relevant" suggestions are
displayed first, so that the suggestion that the user might be looking
for is displayed before more far-fetched suggestions (and thus also
displayed when they'd otherwise be cut-off by the
`-fmax-valid-substitutions` limit). The previous ordering was based on
the order in which the elements appear in the list of imports, which I
believe is less correlated with relevance than this ordering.
A drawback of this approach is that, since we now want to sort the
elements, we can no longer "bail out early" when we've hit the
`-fmax-valid-substitutions` limit.
Reviewers: bgamari, dfeuer
Reviewed By: dfeuer
Subscribers: dfeuer, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4326
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier this year Edward Kmett requested [1] that we enable passing of
vector values in vector registers by default. The GHC calling convention
changes have been in LLVM for a number of years now so let's just flip
the switch.
[1] https://mail.haskell.org/pipermail/ghc-devs/2017-March/013905.html
Reviewers: austin
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4142
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is described in #14152, and can be summarized: Float the exit
path out of a joinrec, so that the simplifier can do more with it.
See the test case for a nice example.
The floating goes against what the simplifier usually does, hence we
need to be careful not inline them back.
The position of exitification in the pipeline was chosen after a small
amount of experimentation, but may need to be improved. For example,
exitification can allow rewrite rules to fire, but for that it would
have to happen before the `simpl_phases`.
Perf.haskell.org reports these nice performance wins:
Nofib allocations
fannkuch-redux 78446640 - 99.92% 64560
k-nucleotide 109466384 - 91.32% 9502040
simple 72424696 - 5.96% 68109560
Nofib instruction counts
fannkuch-redux 1744331636 - 3.86% 1676999519
k-nucleotide 2318221965 - 6.30% 2172067260
scs 1978470869 - 3.35% 1912263779
simple 669858104 - 3.38% 647206739
spectral-norm 186423292 - 5.37% 176411536
Differential Revision: https://phabricator.haskell.org/D3903
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Begins to fix #14214.
[skip ci]
Test Plan: Read it.
Reviewers: austin
Subscribers: rwbarton, thomie
GHC Trac Issues: #14214
Differential Revision: https://phabricator.haskell.org/D4098
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes all dependencies the users guide had on `mkUserGuidePart`.
The generation of the flag reference table and the various pieces of the
man page is now entirely contained within the Spinx extension
`flags.py`. You can see the man page generation on the orphan page
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghc.html
The extension works by collecting all of the meta-data attached to the
`ghc-flag` directives and then formatting and displaying it at
`flag-print` directives. There is a single printing directive that can
be customized with two options, what format to display (table, list, or
block of flags) and an optional category to limit the output to
(verbosity, warnings, codegen, etc.).
New display formats can be added by creating a function
`generate_flag_xxx` (where `xxx` is a description of the format) which
takes a list of flags and a category and returns a new `xxx`. Then just
add a reference in the dispatch table `handlers`. That display can now
be run by passing `:type: xxx` to the `flag-print` directive.
`flags.py` contains two maps of settings that can be adjusted. The first
is a canonical list of flag categories, and the second sets default
categories for files.
The only functionality that Sphinx could not replace was the
`what_glasgow_exts_does.gen.rst` file. `mkUserGuidePart` actually just
reads the list of flags from `compiler/main/DynFlags.hs` which Sphinx
cannot do. As the flag is deprecated, I added the list as a static file
which can be updated manually.
Additionally, this patch updates every single documented flag with the
data from `mkUserGuidePart` to generate the reference table.
Fixes #11654 and, incidentally, #12155.
Reviewers: austin, bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #11654, #12155
Differential Revision: https://phabricator.haskell.org/D3839
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does three things:
1.) It simplifies the flag parsing code in `conf.py` to properly display
flag definitions created by `.. (ghc|rts)-flag::`. Additionally, all flag
references must include the associated arguments. Documentation has been
added to `editing-guide.rst` to explain this.
2.) It normalizes all flag definitions to a similar format. Notably, all
instances of `<>` have been replaced with `⟨⟩`. All references across the
users guide have been updated to match.
3.) It fixes a couple issues with the flag reference table's generation code,
which did not handle comma separated flags in the same cell and did not
properly reference flags with arguments.
Test Plan:
`SPHINXOPTS = -n` to activate "nitpicky" mode, which reports all broken
references. All remaining errors are references to flags without any
documentation.
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #13980
Differential Revision: https://phabricator.haskell.org/D3778
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes #12578.
Update links to SPJ's papers in following files:
* compiler/coreSyn/CoreSyn.hs
* docs/users_guide/using-optimisation.rst
* docs/users_guide/parallel.rst
* docs/users_guide/glasgow_exts.rst
This commit is for ghc-8.2 branch.
Test Plan: build
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #12578
Differential Revision: https://phabricator.haskell.org/D3745
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a previous change (commit 4fd6207ec6960c429e6a1bcbe0282f625010f52a),
the users guide was moved from XML to the RST format. This process
introduced a typo: "No -O*-type option specified:" was changed to "-O*"
(which is not correct). This change fixes it.
See result in: https://prnt.sc/fh332n
Fixes ticket #13756.
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #13756
Differential Revision: https://phabricator.haskell.org/D3631
|
| |
|
|
|
|
| |
Including #13665.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is to implement a mechanism similar to PureScript, where they
suggest which identifiers in scope would fit the given hole. In
PureScript, they use subsumption (which is what we would like here as
well). For subsumption, we would have to check each type in scope
whether the hole is a subtype of the given type, but that would require
`tcSubType` and constraint satisfiability checking. Currently,
`TcSimplify` uses a lot of functions from `TcErrors`, so that would
require more of a rewrite, I will hold on with that for now, and submit
the more simpler type equality version.
As an example, consider
```
ps :: String -> IO ()
ps = putStrLn
ps2 :: a -> IO ()
ps2 _ = putStrLn "hello, world"
main :: IO ()
main = _ "hello, world"
```
The results would be something like
```
• Found hole: _ :: [Char] -> IO ()
• In the expression: _
In a stmt of a 'do' block: _ "hello, world"
In the expression:
do _ "hello, world"
• Relevant bindings include
main :: IO () (bound at test.hs:13:1)
ps :: String -> IO () (bound at test.hs:7:1)
ps2 :: forall a. a -> IO () (bound at test.hs:10:1)
Valid substitutions include
putStrLn :: String
-> IO () (imported from ‘Prelude’ at
test.hs:1:1-14
(and originally defined in
‘System.IO’))
putStr :: String
-> IO () (imported from ‘Prelude’ at
test.hs:1:1-14
(and originally defined in ‘System.IO’))
```
We'd like here for ps2 to be suggested as well, but for that we require
subsumption.
Reviewers: austin, bgamari, dfeuer, mpickering
Reviewed By: dfeuer, mpickering
Subscribers: mpickering, Wizek, dfeuer, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3361
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: austin, bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3354
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I discovered that the dramatic imprvoement in perf/should_run/T9339
with the introduction of join points was really rather a fluke, and
very fragile.
The real problem (see Note [Making SpecConstr keener]) is that
SpecConstr wasn't specialising a function even though it was applied
to a freshly-allocated constructor. The paper describes plausible
reasons for this, but I think it may well be better to be a bit more
aggressive.
So this patch add -fspec-constr-keen, which makes SpecConstr a bit
keener to specialise, by ignoring whether or not the argument
corresponding to a call pattern is scrutinised in the function body.
Now the gains in T9339 should be robust; and it might even be a
better default.
I'd be interested in what happens if we switched on -fspec-constr-keen
with -O2.
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3186
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces a new flag `-fsolve-constant-dicts` which makes the
constraint solver solve super class constraints with available dictionaries if
possible. The flag is enabled by `-O1`.
The motivation of this patch is that the compiler can produce more efficient
code if the constraint solver used top-level instance declarations to solve
constraints that are currently solved givens and their superclasses. In
particular, as it currently stands, the compiler imposes a performance penalty
on the common use-case where superclasses are bundled together for user
convenience. The performance penalty applies to constraint synonyms as
well. This example illustrates the issue:
```
{-# LANGUAGE ConstraintKinds, MultiParamTypeClasses, FlexibleContexts #-}
module B where
class M a b where m :: a -> b
type C a b = (Num a, M a b)
f :: C Int b => b -> Int -> Int
f _ x = x + 1
```
Output without the patch, notice that we get the instance for `Num Int` by
using the class selector `p1`.
```
f :: forall b_arz. C Int b_arz => b_arz -> Int -> Int
f =
\ (@ b_a1EB) ($d(%,%)_a1EC :: C Int b_a1EB) _ (eta1_B1 :: Int) ->
+ @ Int
(GHC.Classes.$p1(%,%) @ (Num Int) @ (M Int b_a1EB) $d(%,%)_a1EC)
eta1_B1
B.f1
```
Output with the patch, nicely optimised code!
```
f :: forall b. C Int b => b -> Int -> Int
f =
\ (@ b) _ _ (x_azg :: Int) ->
case x_azg of { GHC.Types.I# x1_a1DP ->
GHC.Types.I# (GHC.Prim.+# x1_a1DP 1#)
}
```
Reviewers: simonpj, bgamari, austin
Reviewed By: simonpj
Subscribers: mpickering, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2714
GHC Trac Issues: #12791, #5835
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3007
GHC Trac Issues: #12979
|
|
|
|
|
|
|
|
|
|
|
| |
This CSE pass only targets data constructor applications. This is
probably the best we can do, as function calls and primitive operations
might have side-effects.
Introduces the flag -fstg-cse, enabled by default with -O for now. It
might also be a good candiate for -O2.
Differential Revision: https://phabricator.haskell.org/D2871
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces new rules to perform constant folding through
case-expressions.
E.g.,
```
case t -# 10# of _ { ===> case t of _ {
5# -> e1 15# -> e1
8# -> e2 18# -> e2
DEFAULT -> e DEFAULT -> e
```
The initial motivation is that it allows "Merge Nested Cases"
optimization to kick in and to further simplify the code
(see Trac #12877).
Currently we recognize the following operations for Word# and Int#: Add,
Sub, Xor, Not and Negate (for Int# only).
Test Plan: validate
Reviewers: simonpj, austin, bgamari
Reviewed By: simonpj, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2762
GHC Trac Issues: #12877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the flag was silently ignored due the #7679 and #8657. This,
however, seems unnecessarily brutal and makes experimentation unduly
difficult for users.
Test Plan: Validate
Reviewers: austin, simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2335
GHC Trac Issues: #7679, #8657
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Non-exhaustive pattern warnings had their number of patterns to
show hardcoded in the past. This patch implements the TODO remark
that this should be made a command line flag.
-fmax-uncovered-patterns=<n>
can now be used to influence the number of patterns to be shown.
Reviewers: hvr, austin, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2076
|
|
|
|
|
|
|
|
|
|
|
|
| |
when in fact it does. This was pointed out by Johannes Bechberger and
supported with seemingly statistically sound evidence in his Bachelor
thesis: Of the benchmark shootout programs, 80% benefit significantly by
switchtng from -O to -O2.
See https://uqudy.serpens.uberspace.de/blog/2016/02/08/ghc-performance-over-time/
for a few raw numbers.
Differential Revision: https://phabricator.haskell.org/D2065
|
|
|
|
|
|
| |
And GHCi commands. This makes cross-referencing much easier.
Also normalize markup a bit and add some missing flags.
|
|
|