summaryrefslogtreecommitdiff
path: root/testsuite/tests/perf
Commit message (Collapse)AuthorAgeFilesLines
* Fix all broken perf tests on x64 WindowsTamar Christina2017-02-214-5/+25
| | | | | | | | | | | | | various perf tests have been broken over the course of the past few months. This updates the numbers. Test Plan: ./validate Reviewers: austin, bgamari Subscribers: thomie, #ghc_windows_task_force Differential Revision: https://phabricator.haskell.org/D3160
* testsuite: Fix allocations of T10547Ben Gamari2017-02-211-1/+1
| | | | | Previously the comment was correct, but the expected value itself was never updated.
* Bump a few more performance regressions from Type-indexed TypeableBen Gamari2017-02-201-2/+4
| | | | | These are right on the edge of acceptance and are only reproducible on a stressed machine.
* Disable Typeable binding generation for unboxed sumsBen Gamari2017-02-181-2/+2
| | | | | These things are simply too expensive to generate at the moment. More work is needed here; see #13276 and #13261.
* Type-indexed TypeableBen Gamari2017-02-184-56/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This at long last realizes the ideas for type-indexed Typeable discussed in A Reflection on Types (#11011). The general sketch of the project is described on the Wiki (Typeable/BenGamari). The general idea is that we are adding a type index to `TypeRep`, data TypeRep (a :: k) This index allows the typechecker to reason about the type represented by the `TypeRep`. This index representation mechanism is exposed as `Type.Reflection`, which also provides a number of patterns for inspecting `TypeRep`s, ```lang=haskell pattern TRFun :: forall k (fun :: k). () => forall (r1 :: RuntimeRep) (r2 :: RuntimeRep) (arg :: TYPE r1) (res :: TYPE r2). (k ~ Type, fun ~~ (arg -> res)) => TypeRep arg -> TypeRep res -> TypeRep fun pattern TRApp :: forall k2 (t :: k2). () => forall k1 (a :: k1 -> k2) (b :: k1). (t ~ a b) => TypeRep a -> TypeRep b -> TypeRep t -- | Pattern match on a type constructor. pattern TRCon :: forall k (a :: k). TyCon -> TypeRep a -- | Pattern match on a type constructor including its instantiated kind -- variables. pattern TRCon' :: forall k (a :: k). TyCon -> [SomeTypeRep] -> TypeRep a ``` In addition, we give the user access to the kind of a `TypeRep` (#10343), typeRepKind :: TypeRep (a :: k) -> TypeRep k Moreover, all of this plays nicely with 8.2's levity polymorphism, including the newly levity polymorphic (->) type constructor. Library changes --------------- The primary change here is the introduction of a Type.Reflection module to base. This module provides access to the new type-indexed TypeRep introduced in this patch. We also continue to provide the unindexed Data.Typeable interface, which is simply a type synonym for the existentially quantified SomeTypeRep, data SomeTypeRep where SomeTypeRep :: TypeRep a -> SomeTypeRep Naturally, this change also touched Data.Dynamic, which can now export the Dynamic data constructor. Moreover, I removed a blanket reexport of Data.Typeable from Data.Dynamic (which itself doesn't even import Data.Typeable now). We also add a kind heterogeneous type equality type, (:~~:), to Data.Type.Equality. Implementation -------------- The implementation strategy is described in Note [Grand plan for Typeable] in TcTypeable. None of it was difficult, but it did exercise a number of parts of the new levity polymorphism story which had not yet been exercised, which took some sorting out. The rough idea is that we augment the TyCon produced for each type constructor with information about the constructor's kind (which we call a KindRep). This allows us to reconstruct the monomorphic result kind of an particular instantiation of a type constructor given its kind arguments. Unfortunately all of this takes a fair amount of work to generate and send through the compilation pipeline. In particular, the KindReps can unfortunately get quite large. Moreover, the simplifier will float out various pieces of them, resulting in numerous top-level bindings. Consequently we mark the KindRep bindings as noinline, ensuring that the float-outs don't make it into the interface file. This is important since there is generally little benefit to inlining KindReps and they would otherwise strongly affect compiler performance. Performance ----------- Initially I was hoping to also clear up the remaining holes in Typeable's coverage by adding support for both unboxed tuples (#12409) and unboxed sums (#13276). While the former was fairly straightforward, the latter ended up being quite difficult: while the implementation can support them easily, enabling this support causes thousands of Typeable bindings to be emitted to the GHC.Types as each arity-N sum tycon brings with it N promoted datacons, each of which has a KindRep whose size which itself scales with N. Doing this was simply too expensive to be practical; consequently I've disabled support for the time being. Even after disabling sums this change regresses compiler performance far more than I would like. In particular there are several testcases in the testsuite which consist mostly of types which regress by over 30% in compiler allocations. These include (considering the "bytes allocated" metric), * T1969: +10% * T10858: +23% * T3294: +19% * T5631: +41% * T6048: +23% * T9675: +20% * T9872a: +5.2% * T9872d: +12% * T9233: +10% * T10370: +34% * T12425: +30% * T12234: +16% * 13035: +17% * T4029: +6.1% I've spent quite some time chasing down the source of this regression and while I was able to make som improvements, I think this approach of generating Typeable bindings at time of type definition is doomed to give us unnecessarily large compile-time overhead. In the future I think we should consider moving some of all of the Typeable binding generation logic back to the solver (where it was prior to 91c6b1f54aea658b0056caec45655475897f1972). I've opened #13261 documenting this proposal.
* Generalize kind of the (->) tyconBen Gamari2017-02-182-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is generalizes the kind of `(->)`, as discussed in #11714. This involves a few things, * Generalizing the kind of `funTyCon`, adding two new `RuntimeRep` binders, ```lang=haskell (->) :: forall (r1 :: RuntimeRep) (r2 :: RuntimeRep) (a :: TYPE r1) (b :: TYPE r2). a -> b -> * ``` * Unsaturated applications of `(->)` are expressed as explicit `TyConApp`s * Saturated applications of `(->)` are expressed as `FunTy` as they are currently * Saturated applications of `(->)` are expressed by a new `FunCo` constructor in coercions * `splitTyConApp` needs to ensure that `FunTy`s are split to a `TyConApp` of `(->)` with the appropriate `RuntimeRep` arguments * Teach CoreLint to check that all saturated applications of `(->)` are represented with `FunTy` At the moment I assume that `Constraint ~ *`, which is an annoying source of complexity. This will be simplified once D3023 is resolved. Also, this introduces two known regressions, `tcfail181`, `T10403` ===================== Only shows the instance, instance Monad ((->) r) -- Defined in ‘GHC.Base’ in its error message when -fprint-potential-instances is used. This is because its instance head now mentions 'LiftedRep which is not in scope. I'm not entirely sure of the right way to fix this so I'm just accepting the new output for now. T5963 (Typeable) ================ T5963 is now broken since Data.Typeable.Internals.mkFunTy computes its fingerprint without the RuntimeRep variables that (->) expects. This will be fixed with the merge of D2010. Haddock performance =================== The `haddock.base` and `haddock.Cabal` tests regress in allocations by about 20%. This certainly hurts, but it's also not entirely unexpected: the size of every function type grows with this patch and Haddock has a lot of functions in its heap.
* Better perf for haddock.base, haddock.CabalSimon Peyton Jones2017-02-161-2/+4
| | | | | | | | | | | I think this is due to commit 6bab649bde653f13c15eba30d5007bef4a9a9d3a Author: Simon Peyton Jones <simonpj@microsoft.com> Date: Thu Feb 16 09:42:32 2017 +0000 Improve checking of joins in Core Lint Improvement is around 5%.
* Check local type family instances against all imported onesReid Barton2017-02-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We previously checked type family instance declarations in a module for consistency with all instances that we happened to have read into the EPS or HPT. It was possible to arrange that an imported type family instance (used by an imported function) was in a module whose interface file was never read during compilation; then we wouldn't check consistency of local instances with this imported instance and as a result type safety was lost. With this patch, we still check consistency of local type family instances with all type family instances that we have loaded; but we make sure to load the interface files of all our imports that define family instances first. More selective consistency checking is left to #13102. On the other hand, we can now safely assume when we import a module that it has been checked for consistency with its imports. So we can save checking in checkFamInstConsistency, and overall we should have less work to do now. This patch also adds a note describing the Plan for ensuring type family consistency. Test Plan: Two new tests added; harbormaster Reviewers: austin, simonpj, bgamari Reviewed By: simonpj, bgamari Subscribers: ggreif, thomie Differential Revision: https://phabricator.haskell.org/D2992
* Improve the Occurrence Analyzer’s handling of one-shot functionsJoachim Breitner2017-02-111-45/+48
| | | | | | | | | | | | | | | | | | | | | When determining whether an expression is used saturatedly, count the number of value arguments that the occurrence analyser sees, and add the number of one-shot arguments that we know (from the strictness analyser) are passed from the context. perf results suggest no noticable change in allocations, reduction of code sizes, and performance regression possibliy due to loss of join points. Test Plan: perf.haskell.org Reviewers: simonpj, austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3089
* testsuite: Bump bytes allocated for T5837Ben Gamari2017-02-071-1/+4
| | | | | | | | | Simon decreased this earlier today but Harbormaster doesn't reproduce his number. I've done two things here: 1. increased the allocations number to the Harbormaster value 2. increased the acceptance threshold from 5% to 7%, since Simon saw a 6.6% change in his environment.
* testsuite: Bump max_bytes_used for T4029Ben Gamari2017-02-071-1/+2
| | | | Unfortunately it's not clear what regressed this.
* Another improvement to SetLevelsSimon Peyton Jones2017-02-071-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In my recent commit commit 432f952ef64641be9f32152a0fbf2b8496d8fe9c Float unboxed expressions by boxing I changed how float_me in lvlMFE worked. That was right, but it exposed another bug: an error expression wasn't getting floated as it should from a case alternative. And that led to a collection of minor improvements * I found a much better way to cast it, by using lvlFloatRhs for top-level bindinds as well as nested ones, which is (a) more consistent and (b) works correctly. See Note [Floating from a RHS] * I also found some delicacy in the "floating to the top" stuff, so I greatly elaborated the Note [Floating to the top]. * I simplified the "bottoming-float" stuff; the change is in the treatment of bottoming lambdas (\x y. error blah), where we now float the (error blah) part instead of the whole lambda (which risks just making duplicate lambdas. See Note [Bottoming floats], esp (2). Perf effects are minor. * perf/compiler/T13056 improved sligtly (about 2%) in compiler allocations. Also T9233 improved by 1%. I'm not sure why. * Some small nofib changes: - Generally some very small reductions in run-time allocation, except k-nucleotide, which halves for some reason. (I did try to look but it's a big complicated function and it was far from obvious. Had it been a loss I would have looked harder! NB: there's a nearby patch "Do not inline bottoming things" that could also be responsible for either or both. I didn't think it was worth more testing to distinguish. -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- grep +0.1% -0.2% 0.00 0.00 +0.0% mandel -0.1% -1.4% 0.13 0.13 +0.0% k-nucleotide +0.1% -51.6% -1.0% -1.0% +0.0% -------------------------------------------------------------------------------- Min -0.3% -51.6% -9.4% -9.1% -4.0% Max +0.2% +0.0% +31.8% +32.7% +0.0% Geometric Mean -0.0% -0.8% +1.4% +1.4% -0.1%
* Derive <$David Feuer2017-02-072-0/+39
| | | | | | | | | | | | | | | | Using the default definition of `<$` for derived `Functor` instance is very bad for recursive data types. Derive the definition instead. Fixes #13218 Reviewers: austin, bgamari, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, thomie Differential Revision: https://phabricator.haskell.org/D3072
* Do Worker/Wrapper for NOINLINE thingsEric Seidel2017-02-052-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disabling worker/wrapper for NOINLINE things can cause unnecessary reboxing of values. Consider {-# NOINLINE f #-} f :: Int -> a f x = error (show x) g :: Bool -> Bool -> Int -> Int g True True p = f p g False True p = p + 1 g b False p = g b True p the strictness analysis will discover f and g are strict, but because f has no wrapper, the worker for g will rebox p. So we get $wg x y p# = let p = I# p# in -- Yikes! Reboxing! case x of False -> case y of False -> $wg False True p# True -> +# p# 1# True -> case y of False -> $wg True True p# True -> case f p of { } g x y p = case p of (I# p#) -> $wg x y p# Now, in this case the reboxing will float into the True branch, an so the allocation will only happen on the error path. But it won't float inwards if there are multiple branches that call (f p), so the reboxing will happen on every call of g. Disaster. Solution: do worker/wrapper even on NOINLINE things; but move the NOINLINE pragma to the worker. Test Plan: make test TEST="13143" Reviewers: simonpj, bgamari, dfeuer, austin Reviewed By: simonpj, bgamari Subscribers: dfeuer, thomie Differential Revision: https://phabricator.haskell.org/D3046
* testsuite: Update expected values for T13035 and T12234Ben Gamari2017-02-051-2/+4
|
* Bump performance mark for T9020Joachim Breitner2017-02-031-1/+2
| | | | | | according to the graph at perf.haskell.org, it has regressed due to join points, which moved it very very close to the +10% mark and hence made it fail just sometimes.
* testsuite: Update allocations for T12234Ben Gamari2017-02-011-1/+2
| | | | This has been failing on Darwin since 748b79741652028827b6225c36b8ab55d22bdeb0.
* Join pointsLuke Maurer2017-02-0116-10/+308
| | | | | | | | | | | | | | | | | | | This major patch implements Join Points, as described in https://ghc.haskell.org/trac/ghc/wiki/SequentCore. You have to read that page, and especially the paper it links to, to understand what's going on; but it is very cool. It's Luke Maurer's work, but done in close collaboration with Simon PJ. This Phab is a squash-merge of wip/join-points branch of http://github.com/lukemaurer/ghc. There are many, many interdependent changes. Reviewers: goldfire, mpickering, bgamari, simonmar, dfeuer, austin Subscribers: simonpj, dfeuer, mpickering, Mikolaj, thomie Differential Revision: https://phabricator.haskell.org/D2853
* Use top-level instances to solve superclasses where possibleDaniel Haraj2017-01-315-4/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new flag `-fsolve-constant-dicts` which makes the constraint solver solve super class constraints with available dictionaries if possible. The flag is enabled by `-O1`. The motivation of this patch is that the compiler can produce more efficient code if the constraint solver used top-level instance declarations to solve constraints that are currently solved givens and their superclasses. In particular, as it currently stands, the compiler imposes a performance penalty on the common use-case where superclasses are bundled together for user convenience. The performance penalty applies to constraint synonyms as well. This example illustrates the issue: ``` {-# LANGUAGE ConstraintKinds, MultiParamTypeClasses, FlexibleContexts #-} module B where class M a b where m :: a -> b type C a b = (Num a, M a b) f :: C Int b => b -> Int -> Int f _ x = x + 1 ``` Output without the patch, notice that we get the instance for `Num Int` by using the class selector `p1`. ``` f :: forall b_arz. C Int b_arz => b_arz -> Int -> Int f = \ (@ b_a1EB) ($d(%,%)_a1EC :: C Int b_a1EB) _ (eta1_B1 :: Int) -> + @ Int (GHC.Classes.$p1(%,%) @ (Num Int) @ (M Int b_a1EB) $d(%,%)_a1EC) eta1_B1 B.f1 ``` Output with the patch, nicely optimised code! ``` f :: forall b. C Int b => b -> Int -> Int f = \ (@ b) _ _ (x_azg :: Int) -> case x_azg of { GHC.Types.I# x1_a1DP -> GHC.Types.I# (GHC.Prim.+# x1_a1DP 1#) } ``` Reviewers: simonpj, bgamari, austin Reviewed By: simonpj Subscribers: mpickering, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D2714 GHC Trac Issues: #12791, #5835
* testsuite: Bump compiler allocations of T5837Ben Gamari2017-01-241-1/+3
| | | | | Gipeda suggests that this is due to the recent top-level string literals in Core patch.
* Record evaluated-ness on workers and wrappersSimon Peyton Jones2017-01-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch is a refinement of the original commit (which was reverted): commit 6b976eb89fe72827f226506d16d3721ba4e28bab Date: Fri Jan 13 08:56:53 2017 +0000 Record evaluated-ness on workers and wrappers In Trac #13027, comment:20, I noticed that wrappers created after demand analysis weren't recording the evaluated-ness of strict constructor arguments. In the ticket that led to a (debatable) Lint error but in general the more we know about evaluated-ness the better we can optimise. This commit adds that info * both in the worker (on args) * and in the wrapper (on CPR result patterns). See Note [Record evaluated-ness in worker/wrapper] in WwLib On the way I defined Id.setCaseBndrEvald, and used it to shorten the code in a few other places Then I added test T13077a to test the CPR aspect of this patch, but I found that Lint failed! Reason: simpleOptExpr was discarding evaluated-ness info on lambda binders because zapFragileIdInfo was discarding an Unfolding of (OtherCon _). But actually that's a robust unfolding; there is no need to discard it. To fix this: * zapFragileIdInfo only zaps fragile unfoldings * Replace isClosedUnfolding with isFragileUnfolding (the latter is just the negation of the former, but the nomenclature is more consistent). Better documentation too Note [Fragile unfoldings] * And Simplify.simplLamBndr can now look at isFragileUnfolding to decide whether to use the longer route of simplUnfolding. For some reason perf/compiler/T9233 improves in compile-time allocation by 10%. Hooray Nofib: essentially no change: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- cacheprof +0.0% -0.3% +0.9% +0.4% +0.0% -------------------------------------------------------------------------------- Min +0.0% -0.3% -2.4% -2.4% +0.0% Max +0.0% +0.0% +9.8% +11.4% +2.4% Geometric Mean +0.0% -0.0% +1.1% +1.0% +0.0%
* Remove clean_cmd and extra_clean usage from .T filesThomas Miedema2017-01-221-14/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The `clean_cmd` and `extra_clean` setup functions don't do anything. Remove them from .T files. Created using https://github.com/thomie/refactor-ghc-testsuite. This diff is a test for the .T-file parser/processor/pretty-printer in that repository. find . -name '*.T' -exec ~/refactor-ghc-testsuite/Main "{}" \; Tests containing inline comments or multiline strings are not modified. Preparation for #12223. Test Plan: Harbormaster Reviewers: austin, hvr, simonmar, mpickering, bgamari Reviewed By: mpickering Subscribers: mpickering Differential Revision: https://phabricator.haskell.org/D3000 GHC Trac Issues: #12223
* testsuite: Bump allocations on T5321Fun and T12707Ben Gamari2017-01-221-2/+9
| | | | | These are only failing on Darwin, strangely enough, but do so quite reproducibly.
* Allow top-level string literals in Core (#8472)Takano Akio2017-01-205-6/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commits relaxes the invariants of the Core syntax so that a top-level variable can be bound to a primitive string literal of type Addr#. This commit: * Relaxes the invatiants of the Core, and allows top-level bindings whose type is Addr# as long as their RHS is either a primitive string literal or another variable. * Allows the simplifier and the full-laziness transformer to float out primitive string literals to the top leve. * Introduces the new StgGenTopBinding type to accomodate top-level Addr# bindings. * Introduces a new type of labels in the object code, with the suffix "_bytes", for exported top-level Addr# bindings. * Makes some built-in rules more robust. This was necessary to keep them functional after the above changes. This is a continuation of D2554. Rebasing notes: This had two slightly suspicious performance regressions: * T12425: bytes allocated regressed by roughly 5% * T4029: bytes allocated regressed by a bit over 1% * T13035: bytes allocated regressed by a bit over 5% These deserve additional investigation. Rebased by: bgamari. Test Plan: ./validate --slow Reviewers: goldfire, trofi, simonmar, simonpj, austin, hvr, bgamari Reviewed By: trofi, simonpj, bgamari Subscribers: trofi, simonpj, gridaphobe, thomie Differential Revision: https://phabricator.haskell.org/D2605 GHC Trac Issues: #8472
* testsuite: Bump allocations for T12234Ben Gamari2017-01-171-3/+4
| | | | Unfortunately it's not clear why but this has been failing on Harbormaster.
* Typos in manual, comments and testsGabor Greif2017-01-121-1/+1
|
* Inline partially-applied wrappersDavid Feuer2017-01-102-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose we have ``` data Node a = Node2 !Int a a | Node3 !Int a a a instance Traversable Node where traverse f (Node2 s x y) = Node2 s <$> f x <*> f y ... ``` Since `Node2` is partially applied, we wouldn't inline its wrapper. The result was that we'd box up the `Int#` to put the box in the closure passed to `fmap`. We now allow the wrapper to inline when partially applied, so GHC stores the `Int#` directly in the closure. Reviewers: rwbarton, mpickering, simonpj, austin, bgamari Reviewed By: simonpj, bgamari Subscribers: mpickering, thomie Differential Revision: https://phabricator.haskell.org/D2891 GHC Trac Issues: #12990
* Mark *FB functions INLINE[0] (Fixes #13001)Takano Akio2017-01-103-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When fusion rules successfully fire, we are left with calls to *FB functions. They are higher-order functions, and therefore they often benefit from inlining. This is particularly important when then final consumer is a strict fold (foldl', length, etc.), because not inlining these functions means allocating a function closure for each element in the list, which often is more costly than what fusion eliminates. Nofib shows a slight increase in the binary size: ------------------------------------------------------------------------ Program Size Allocs Runtime Elapsed TotalMem ------------------------------------------------------------------------ gen_regexps -0.3% 0.0% 0.000 0.000 0.0% puzzle +0.8% 0.0% 0.089 0.090 0.0% reptile +0.8% -0.0% 0.008 0.008 0.0% ------------------------------------------------------------------------ Min -0.3% -0.0% -7.3% -7.1% 0.0% Max +0.8% +0.0% +7.8% +7.7% +1.8% Geometric Mean +0.0% -0.0% +0.2% +0.2% +0.0% ------------------------------------------------------------------------ Reviewers: simonpj, austin, hvr, bgamari Reviewed By: simonpj Subscribers: simonpj, thomie Differential Revision: https://phabricator.haskell.org/D2951 GHC Trac Issues: #13001
* testsuite: Add performance testcase from #12707Ben Gamari2017-01-102-0/+204
|
* Actually add the right file for T13035 stderrMatthew Pickering2017-01-061-1/+4
|
* Add missing stderr file for T13035Matthew Pickering2017-01-061-1/+1
|
* Add performance test for #13056Ryan Scott2017-01-062-1/+36
| | | | | | This performance regression was fixed by commit 517d03e41b4f5c144d1ad684539340421be2be2a (#12234). Let's add a performance test to ensure that it doesn't break again.
* Fix the implementation of the "push rules"Simon Peyton Jones2017-01-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Richard pointed out (comment:12 of Trac #13025) that my implementation of the coercion "push rules", newly added in exprIsConAppMaybe by commit b4c3a66, wasn't quite right. But in fact that means that the implementation of those same rules in Simplify.simplCast was wrong too. Hence this commit: * Refactor the push rules so they are implemented in just one place (CoreSubst.pushCoArgs, pushCoTyArg, pushCoValArg) The code in Simplify gets simpler, which is nice. * Fix the bug that Richard pointed out (to do with hetero-kinded coercions) Then compiler performance worsened, which led mt do discover two performance bugs: * The smart constructor Coercion.mkNthCo didn't have a case for ForAllCos, which meant we stupidly build a complicated coercion where a simple one would do * In OptCoercion there was one place where we used CoherenceCo (the data constructor) rather than mkCoherenceCo (the smart constructor), which meant that the the stupid complicated coercion wasn't optimised away For reasons I don't fully understand, T5321Fun did 2% less compiler allocation after all this, which is good.
* Avoid exponential blowup in FamInstEnv.normaliseTypeSimon Peyton Jones2017-01-063-0/+155
| | | | | | Trac #13035 showed up a nasty case where we took exponentially long to normalise a (actually rather simple) type. Fortunately it was easy to fix: see Note [Normalisation and type synonyms].
* Fix test for T12877Sylvain Henry2016-12-232-135/+0
| | | | | | | | | | | | Summary: See https://phabricator.haskell.org/rGHCd3b546b1a605 Reviewers: nomeata, austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2883
* testsuite: Split out Windows allocations numbers for T12234Ben Gamari2016-12-231-1/+3
|
* Support for abi-depends for computing shadowing.Edward Z. Yang2016-12-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a complete fix based off of ed7af26606b3a605a4511065ca1a43b1c0f3b51d for handling shadowing and out-of-order -package-db flags simultaneously. The general strategy is we first put all databases together, overriding packages as necessary. Once this is done, we successfully prune out broken packages, including packages which depend on a package whose ABI differs from the ABI we need. Our check gracefully degrades in the absence of abi-depends, as we only check deps which are recorded in abi-depends. Contains time and Cabal submodule update. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: niteria, austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2846 GHC Trac Issues: #12485
* Improved perf for T12227Simon Peyton Jones2016-12-211-1/+3
| | | | | | | | | | | | | | | Improved compiler allocations by abut 5%. It comes from one of 1a4c04b1 Fix 'SPECIALISE instance' c48595ee Never apply worker/wrapper to DFuns 05d233e8 Move InId/OutId to CoreSyn e07ad4db Don't eta-expand in stable unfoldings d250d493 Add INLINE pragamas on Traversable default methods c66dd05c Move typeSize/coercionSize into TyCoRep I think d250d493. But it's good anyway.
* Test Trac #12996Simon Peyton Jones2016-12-203-0/+60
|
* testsuite: Separate out Windows results for T5205Ben Gamari2016-12-151-1/+6
| | | | | | | | | | | | | | | | | | | | | This test seems to have much different allocation behavior on Windows and Linux. Previously we had widened the acceptance window to 7% to accomodate this, but even this isn't enough any more. Instead of further widening the window let's just give an expected number for each platform. Really, this is precisely the issue with our performance testing model which I've been complaining about in #12758. Fixes test for #5205 on 64-bit Windows. Test Plan: Validate on Windows Reviewers: austin Subscribers: thomie, Phyx Differential Revision: https://phabricator.haskell.org/D2848 GHC Trac Issues: #5205
* testsuite: Specify expected allocations of T12877 for WindowsBen Gamari2016-12-151-1/+5
| | | | | This deviated by 12% from the expected allocations on Windows. Yet another case of #12758.
* Scrutinee Constant FoldingSylvain Henry2016-12-093-0/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces new rules to perform constant folding through case-expressions. E.g., ``` case t -# 10# of _ { ===> case t of _ { 5# -> e1 15# -> e1 8# -> e2 18# -> e2 DEFAULT -> e DEFAULT -> e ``` The initial motivation is that it allows "Merge Nested Cases" optimization to kick in and to further simplify the code (see Trac #12877). Currently we recognize the following operations for Word# and Int#: Add, Sub, Xor, Not and Negate (for Int# only). Test Plan: validate Reviewers: simonpj, austin, bgamari Reviewed By: simonpj, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2762 GHC Trac Issues: #12877
* Use isFamFreeTyCon now we have itSimon Peyton Jones2016-12-051-1/+3
| | | | Refactoring only
* Fix an asymptotic bug in the occurrence analyserSimon Peyton Jones2016-12-053-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trac #12425 and #12234 showed up a major and long-standing bug in the occurrence analyser, whereby it could generate explonentially large program! There's a lot of commentary on #12425; and it's all described in Note [Loop breakers, node scoring, and stability] I did quite a lot of refactoring to make the code comprehensibe again (its structure had bit-rotted rather), so the patch looks bigger than it really is. Hurrah! I did a nofib run to check that I hadn't inadertently ruined anything: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- fluid -0.3% -1.5% 0.01 0.01 +0.0% parser -0.9% +0.6% 0.04 0.04 +0.0% prolog -0.1% +1.2% 0.00 0.00 +0.0% -------------------------------------------------------------------------------- Min -0.9% -1.5% -8.6% -8.7% +0.0% Max +0.1% +1.2% +7.7% +7.8% +2.4% Geometric Mean -0.2% -0.0% -0.2% -0.3% +0.0% I checked what happened in 'prolog'. It seems that we have a recursive data structure something like this f :: [blah] f x = build (\cn. ...g... ) g :: [blah2] g y = ....(foldr k z (f y)).... If we inline 'f' into 'g' we get better fusion than the other way round, but we don't have any way to spot that at the moment. (I wonder if we could do worker/wrapper for functions returning a 'build'?) It was happening before by a fluke. Anyway I decided to accept this; it's relatively rare I think.
* testsuite: Actually update haddock.compiler allocationsBen Gamari2016-12-011-1/+1
| | | | The previous attempt updated the comment but not the value. Silly me.
* Update test output for WindowsTamar Christina2016-11-291-1/+2
| | | | | | | | | | | | | | | | | | | | | Following D2684 these two tests need to be updated: * T7037: timeout.exe now waits until all processes are finished. this makes T7037 reliable. So enabled. * T876: Unknown reason, allocations are much lower than before. Test Plan: ./validate Reviewers: austin, simonmar, bgamari Reviewed By: bgamari Subscribers: thomie, #ghc_windows_task_force Differential Revision: https://phabricator.haskell.org/D2759 GHC Trac Issues: #12725, #12004
* testsuite: Bump haddock.compiler allocationsBen Gamari2016-11-291-0/+1
| | | | Unfortunately it's quite unclear what caused this.
* Perf improvements in T6048, T10547Simon Peyton Jones2016-11-251-2/+6
| | | | | I think this wave of commits just made these two a little better; they must have been close to the threshold before.
* Another major constraint-solver refactoringSimon Peyton Jones2016-11-253-104/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch takes further my refactoring of the constraint solver, which I've been doing over the last couple of months in consultation with Richard. It fixes a number of tricky bugs that made the constraint solver actually go into a loop, including Trac #12526 Trac #12444 Trac #12538 The main changes are these * Flatten unification variables (fmvs/fuvs) appear on the LHS of a tvar/tyvar equality; thus fmv ~ alpha and not alpha ~ fmv See Note [Put flatten unification variables on the left] in TcUnify. This is implemented by TcUnify.swapOverTyVars. * Don't reduce a "loopy" CFunEqCan where the fsk appears on the LHS: F t1 .. tn ~ fsk where 'fsk' is free in t1..tn. See Note [FunEq occurs-check principle] in TcInteract This neatly stops some infinite loops that people reported; and it allows us to delete some crufty code in reduce_top_fun_eq. And it appears to be no loss whatsoever. As well as fixing loops, ContextStack2 and T5837 both terminate when they didn't before. * Previously we generated "derived shadow" constraints from Wanteds, but we could (and sometimes did; Trac #xxxx) repeatedly generate a derived shadow from the same Wanted. A big change in this patch is to have two kinds of Wanteds: [WD] behaves like a pair of a Wanted and a Derived [W] behaves like a Wanted only See CtFlavour and ShadowInfo in TcRnTypes, and the ctev_nosh field of a Wanted. This turned out to be a lot simpler. A [WD] gets split into a [W] and a [D] in TcSMonad.maybeEmitShaodow. See TcSMonad Note [The improvement story and derived shadows] * Rather than have a separate inert_model in the InertCans, I've put the derived equalities back into inert_eqs. We weren't gaining anything from a separate field. * Previously we had a mode for the constraint solver in which it would more aggressively solve Derived constraints; it was used for simplifying the context of a 'deriving' clause, or a 'default' delcaration, for example. But the complexity wasn't worth it; now I just make proper Wanted constraints. See TcMType.cloneWC * Don't generate injectivity improvement for Givens; see Note [No FunEq improvement for Givens] in TcInteract * solveSimpleWanteds leaves the insolubles in-place rather than returning them. Simpler. I also did lots of work on comments, including fixing Trac #12821.
* testsuite: Fix creep of T4029Ben Gamari2016-11-141-1/+2
| | | | Ostensibly due to IfaceType rework, but I have my doubts.