diff options
author | Daniel Haraj <dan@obsidian.systems> | 2017-01-31 22:28:55 +0000 |
---|---|---|
committer | Matthew Pickering <matthewtpickering@gmail.com> | 2017-01-31 22:31:03 +0000 |
commit | 748b79741652028827b6225c36b8ab55d22bdeb0 (patch) | |
tree | 100d4c51d4a1a621f3de68a211d7ed047f84e498 | |
parent | c71f0c4ef931885a6c35d64e803338ba3781ff23 (diff) | |
download | haskell-748b79741652028827b6225c36b8ab55d22bdeb0.tar.gz |
Use top-level instances to solve superclasses where possible
This patch introduces a new flag `-fsolve-constant-dicts` which makes the
constraint solver solve super class constraints with available dictionaries if
possible. The flag is enabled by `-O1`.
The motivation of this patch is that the compiler can produce more efficient
code if the constraint solver used top-level instance declarations to solve
constraints that are currently solved givens and their superclasses. In
particular, as it currently stands, the compiler imposes a performance penalty
on the common use-case where superclasses are bundled together for user
convenience. The performance penalty applies to constraint synonyms as
well. This example illustrates the issue:
```
{-# LANGUAGE ConstraintKinds, MultiParamTypeClasses, FlexibleContexts #-}
module B where
class M a b where m :: a -> b
type C a b = (Num a, M a b)
f :: C Int b => b -> Int -> Int
f _ x = x + 1
```
Output without the patch, notice that we get the instance for `Num Int` by
using the class selector `p1`.
```
f :: forall b_arz. C Int b_arz => b_arz -> Int -> Int
f =
\ (@ b_a1EB) ($d(%,%)_a1EC :: C Int b_a1EB) _ (eta1_B1 :: Int) ->
+ @ Int
(GHC.Classes.$p1(%,%) @ (Num Int) @ (M Int b_a1EB) $d(%,%)_a1EC)
eta1_B1
B.f1
```
Output with the patch, nicely optimised code!
```
f :: forall b. C Int b => b -> Int -> Int
f =
\ (@ b) _ _ (x_azg :: Int) ->
case x_azg of { GHC.Types.I# x1_a1DP ->
GHC.Types.I# (GHC.Prim.+# x1_a1DP 1#)
}
```
Reviewers: simonpj, bgamari, austin
Reviewed By: simonpj
Subscribers: mpickering, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2714
GHC Trac Issues: #12791, #5835
-rw-r--r-- | compiler/main/DynFlags.hs | 3 | ||||
-rw-r--r-- | compiler/typecheck/TcInteract.hs | 259 | ||||
-rw-r--r-- | compiler/typecheck/TcSMonad.hs | 16 | ||||
-rw-r--r-- | docs/users_guide/8.2.1-notes.rst | 9 | ||||
-rw-r--r-- | docs/users_guide/using-optimisation.rst | 31 | ||||
-rw-r--r-- | testsuite/tests/perf/should_run/T12791.hs | 15 | ||||
-rw-r--r-- | testsuite/tests/perf/should_run/T12791.stdout | 1 | ||||
-rw-r--r-- | testsuite/tests/perf/should_run/T5835.hs | 11 | ||||
-rw-r--r-- | testsuite/tests/perf/should_run/T5835.stdout | 1 | ||||
-rw-r--r-- | testsuite/tests/perf/should_run/all.T | 28 |
10 files changed, 355 insertions, 19 deletions
diff --git a/compiler/main/DynFlags.hs b/compiler/main/DynFlags.hs index 4c4002d83e..682480b44f 100644 --- a/compiler/main/DynFlags.hs +++ b/compiler/main/DynFlags.hs @@ -455,6 +455,7 @@ data GeneralFlag | Opt_Loopification -- See Note [Self-recursive tail calls] | Opt_CprAnal | Opt_WorkerWrapper + | Opt_SolveConstantDicts -- Interface files | Opt_IgnoreInterfacePragmas @@ -3661,6 +3662,7 @@ fFlagsDeps = [ flagSpec "vectorise" Opt_Vectorise, flagSpec "version-macros" Opt_VersionMacros, flagSpec "worker-wrapper" Opt_WorkerWrapper, + flagSpec "solve-constant-dicts" Opt_SolveConstantDicts, flagSpec "show-warning-groups" Opt_ShowWarnGroups, flagSpec "hide-source-paths" Opt_HideSourcePaths, flagSpec "show-hole-constraints" Opt_ShowHoleConstraints @@ -4042,6 +4044,7 @@ optLevelFlags -- see Note [Documenting optimisation flags] , ([1,2], Opt_UnboxSmallStrictFields) , ([1,2], Opt_CprAnal) , ([1,2], Opt_WorkerWrapper) + , ([1,2], Opt_SolveConstantDicts) , ([2], Opt_LiberateCase) , ([2], Opt_SpecConstr) diff --git a/compiler/typecheck/TcInteract.hs b/compiler/typecheck/TcInteract.hs index 2b66126fa4..33057cd4fe 100644 --- a/compiler/typecheck/TcInteract.hs +++ b/compiler/typecheck/TcInteract.hs @@ -59,6 +59,9 @@ import DynFlags import Util import qualified GHC.LanguageExtensions as LangExt +import Control.Monad.Trans.Class +import Control.Monad.Trans.Maybe + {- ********************************************************************** * * @@ -675,6 +678,154 @@ interactIrred _ wi = pprPanic "interactIrred" (ppr wi) interactDict * * ********************************************************************************* + +Note [Solving from instances when interacting Dicts] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we interact a [W] constraint with a [G] constraint that solves it, there is +a possibility that we could produce better code if instead we solved from a +top-level instance declaration (See #12791, #5835). For example: + + class M a b where m :: a -> b + + type C a b = (Num a, M a b) + + f :: C Int b => b -> Int -> Int + f _ x = x + 1 + +The body of `f` requires a [W] `Num Int` instance. We could solve this +constraint from the givens because we have `C Int b` and that provides us a +solution for `Num Int`. This would let us produce core like the following +(with -O2): + + f :: forall b. C Int b => b -> Int -> Int + f = \ (@ b) ($d(%,%) :: C Int b) _ (eta1 :: Int) -> + + @ Int + (GHC.Classes.$p1(%,%) @ (Num Int) @ (M Int b) $d(%,%)) + eta1 + A.f1 + +This is bad! We could do much better if we solved [W] `Num Int` directly from +the instance that we have in scope: + + f :: forall b. C Int b => b -> Int -> Int + f = \ (@ b) _ _ (x :: Int) -> + case x of { GHC.Types.I# x1 -> GHC.Types.I# (GHC.Prim.+# x1 1#) } + +However, there is a reason why the solver does not simply try to solve such +constraints with top-level instances. If the solver finds a relevant instance +declaration in scope, that instance may require a context that can't be solved +for. A good example of this is: + + f :: Ord [a] => ... + f x = ..Need Eq [a]... + +If we have instance `Eq a => Eq [a]` in scope and we tried to use it, we would +be left with the obligation to solve the constraint Eq a, which we cannot. So we +must be conservative in our attempt to use an instance declaration to solve the +[W] constraint we're interested in. Our rule is that we try to solve all of the +instance's subgoals recursively all at once. Precisely: We only attempt to +solve constraints of the form `C1, ... Cm => C t1 ... t n`, where all the Ci are +themselves class constraints of the form `C1', ... Cm' => C' t1' ... tn'` and we +only succeed if the entire tree of constraints is solvable from instances. + +An example that succeeds: + + class Eq a => C a b | b -> a where + m :: b -> a + + f :: C [Int] b => b -> Bool + f x = m x == [] + +We solve for `Eq [Int]`, which requires `Eq Int`, which we also have. This +produces the following core: + + f :: forall b. C [Int] b => b -> Bool + f = \ (@ b) ($dC :: C [Int] b) (x :: b) -> + GHC.Classes.$fEq[]_$s$c== + (m @ [Int] @ b $dC x) (GHC.Types.[] @ Int) + +An example that fails: + + class Eq a => C a b | b -> a where + m :: b -> a + + f :: C [a] b => b -> Bool + f x = m x == [] + +Which, because solving `Eq [a]` demands `Eq a` which we cannot solve, produces: + + f :: forall a b. C [a] b => b -> Bool + f = \ (@ a) (@ b) ($dC :: C [a] b) (eta :: b) -> + == + @ [a] + (A.$p1C @ [a] @ b $dC) + (m @ [a] @ b $dC eta) + (GHC.Types.[] @ a) + +This optimization relies on coherence of dictionaries to be correct. When we +cannot assume coherence because of IncoherentInstances then this optimization +can change the behavior of the users code. + +The following four modules produce a program whose output would change depending +on whether we apply this optimization when IncoherentInstances is in effect: + +######### + {-# LANGUAGE MultiParamTypeClasses #-} + module A where + + class A a where + int :: a -> Int + + class A a => C a b where + m :: b -> a -> a + +######### + {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-} + module B where + + import A + + instance A a where + int _ = 1 + + instance C a [b] where + m _ = id + +######### + {-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, FlexibleContexts #-} + {-# LANGUAGE IncoherentInstances #-} + module C where + + import A + + instance A Int where + int _ = 2 + + instance C Int [Int] where + m _ = id + + intC :: C Int a => a -> Int -> Int + intC _ x = int x + +######### + module Main where + + import A + import B + import C + + main :: IO () + main = print (intC [] (0::Int)) + +The output of `main` if we avoid the optimization under the effect of +IncoherentInstances is `1`. If we were to do the optimization, the output of +`main` would be `2`. + +It is important to emphasize that failure means that we don't produce more +efficient code, NOT that we fail to typecheck at all! This is purely an +an optimization: exactly the same programs should typecheck with or without this +procedure. + -} interactDict :: InertCans -> Ct -> TcS (StopOrContinue Ct) @@ -703,18 +854,29 @@ interactDict inerts workItem@(CDictCan { cc_ev = ev_w, cc_class = cls, cc_tyargs ; let ev_cs = EvCsPushCall func (ctLocSpan loc) (getEvTerm mb_new) ; solveCallStack ev_w ev_cs ; stopWith ev_w "Wanted CallStack IP" } - | Just ctev_i <- lookupInertDict inerts cls tys - = do { (inert_effect, stop_now) <- solveOneFromTheOther ctev_i ev_w - ; case inert_effect of - IRKeep -> return () - IRDelete -> updInertDicts $ \ ds -> delDict ds cls tys - IRReplace -> updInertDicts $ \ ds -> addDict ds cls tys workItem - ; if stop_now then - return (Stop ev_w (text "Dict equal" <+> parens (ppr inert_effect))) - else - continueWith workItem } - + = do + { dflags <- getDynFlags + -- See Note [Solving from instances when interacting Dicts] + ; try_inst_res <- trySolveFromInstance dflags ev_w ctev_i + ; case try_inst_res of + Just evs -> do + { flip mapM_ evs $ \(ev_t, ct_ev, cls, typ) -> do + { setWantedEvBind (ctEvId ct_ev) ev_t + ; addSolvedDict ct_ev cls typ } + ; stopWith ev_w "interactDict/solved from instance" } + -- We were unable to solve the [W] constraint from in-scope instances so + -- we solve it from the solution in the inerts we just retrieved. + Nothing -> do + { (inert_effect, stop_now) <- solveOneFromTheOther ctev_i ev_w + ; case inert_effect of + IRKeep -> return () + IRDelete -> updInertDicts $ \ ds -> delDict ds cls tys + IRReplace -> updInertDicts $ \ ds -> addDict ds cls tys workItem + ; if stop_now then + return $ Stop ev_w (text "Dict equal" <+> parens (ppr inert_effect)) + else + continueWith workItem } } | cls `hasKey` ipClassKey , isGiven ev_w = interactGivenIP inerts workItem @@ -725,6 +887,81 @@ interactDict inerts workItem@(CDictCan { cc_ev = ev_w, cc_class = cls, cc_tyargs interactDict _ wi = pprPanic "interactDict" (ppr wi) +-- See Note [Solving from instances when interacting Dicts] +trySolveFromInstance :: DynFlags + -> CtEvidence -- Work item + -> CtEvidence -- Inert we want to try to replace + -> TcS (Maybe [(EvTerm, CtEvidence, Class, [TcPredType])]) + -- Everything we need to bind a solution for the work item + -- and add the solved Dict to the cache in the main solver. +trySolveFromInstance dflags ev_w ctev_i + | isWanted ev_w + && isGiven ctev_i + -- We are about to solve a [W] constraint from a [G] constraint. We take + -- a moment to see if we can get a better solution using an instance. + -- Note that we only do this for the sake of performance. Exactly the same + -- programs should typecheck regardless of whether we take this step or + -- not. See Note [Solving from instances when interacting Dicts] + && not (xopt LangExt.IncoherentInstances dflags) + -- If IncoherentInstances is on then we cannot rely on coherence of proofs + -- in order to justify this optimization: The proof provided by the + -- [G] constraint's superclass may be different from the top-level proof. + && gopt Opt_SolveConstantDicts dflags + -- Enabled by the -fsolve-constant-dicts flag + = runMaybeT $ try_solve_from_instance emptyDictMap ev_w + + | otherwise = return Nothing + where + -- This `CtLoc` is used only to check the well-staged condition of any + -- candidate DFun. Our subgoals all have the same stage as our root + -- [W] constraint so it is safe to use this while solving them. + loc_w = ctEvLoc ev_w + + -- Use a local cache of solved dicts while emitting EvVars for new work + -- We bail out of the entire computation if we need to emit an EvVar for + -- a subgoal that isn't a ClassPred. + new_wanted_cached :: DictMap CtEvidence -> TcPredType -> MaybeT TcS MaybeNew + new_wanted_cached cache pty + | ClassPred cls tys <- classifyPredType pty + = lift $ case findDict cache cls tys of + Just ctev -> return $ Cached (ctEvTerm ctev) + Nothing -> Fresh <$> newWantedNC loc_w pty + | otherwise = mzero + + -- MaybeT manages early failure if we find a subgoal that cannot be solved + -- from instances. + -- Why do we need a local cache here? + -- 1. We can't use the global cache because it contains givens that + -- we specifically don't want to use to solve. + -- 2. We need to be able to handle recursive super classes. The + -- cache ensures that we remember what we have already tried to + -- solve to avoid looping. + try_solve_from_instance + :: DictMap CtEvidence -> CtEvidence + -> MaybeT TcS [(EvTerm, CtEvidence, Class, [TcPredType])] + try_solve_from_instance cache ev + | ClassPred cls tys <- classifyPredType (ctEvPred ev) = do + -- It is important that we add our goal to the cache before we solve! + -- Otherwise we may end up in a loop while solving recursive dictionaries. + { let cache' = addDict cache cls tys ev + ; inst_res <- lift $ match_class_inst dflags cls tys loc_w + ; case inst_res of + GenInst { lir_new_theta = preds + , lir_mk_ev = mk_ev + , lir_safe_over = safeOverlap } + | safeOverlap -> do + -- emit work for subgoals but use our local cache so that we can + -- solve recursive dictionaries. + { evc_vs <- mapM (new_wanted_cached cache') preds + ; subgoalBinds <- mapM (try_solve_from_instance cache') + (freshGoals evc_vs) + ; return $ (mk_ev (map getEvTerm evc_vs), ev, cls, preds) + : concat subgoalBinds } + + | otherwise -> mzero + _ -> mzero } + | otherwise = mzero + addFunDepWork :: InertCans -> CtEvidence -> Class -> TcS () -- Add derived constraints from type-class functional dependencies. addFunDepWork inerts work_ev cls diff --git a/compiler/typecheck/TcSMonad.hs b/compiler/typecheck/TcSMonad.hs index d80fea1fda..8b6a816dea 100644 --- a/compiler/typecheck/TcSMonad.hs +++ b/compiler/typecheck/TcSMonad.hs @@ -30,7 +30,7 @@ module TcSMonad ( newTcEvBinds, newWantedEq, emitNewWantedEq, - newWanted, newWantedEvVar, newWantedEvVarNC, newDerivedNC, + newWanted, newWantedEvVar, newWantedNC, newWantedEvVarNC, newDerivedNC, newBoundEvVarId, unifyTyVar, unflattenFmv, reportUnifications, setEvBind, setWantedEq, setEqIfWanted, @@ -67,8 +67,8 @@ module TcSMonad ( getSafeOverlapFailures, -- Inert CDictCans - lookupInertDict, findDictsByClass, addDict, addDictsByClass, - delDict, foldDicts, filterDicts, + DictMap, emptyDictMap, lookupInertDict, findDictsByClass, addDict, + addDictsByClass, delDict, foldDicts, filterDicts, findDict, -- Inert CTyEqCans EqualCtList, findTyEqs, foldTyEqs, isInInertEqs, @@ -2914,7 +2914,7 @@ newWantedEq loc role ty1 ty2 where pty = mkPrimEqPredRole role ty1 ty2 --- no equalities here. Use newWantedEqNC instead +-- no equalities here. Use newWantedEq instead newWantedEvVarNC :: CtLoc -> TcPredType -> TcS CtEvidence -- Don't look up in the solved/inerts; we know it's not there newWantedEvVarNC loc pty @@ -2946,6 +2946,14 @@ newWanted loc pty | otherwise = newWantedEvVar loc pty +-- deals with both equalities and non equalities. Doesn't do any cache lookups. +newWantedNC :: CtLoc -> PredType -> TcS CtEvidence +newWantedNC loc pty + | Just (role, ty1, ty2) <- getEqPredTys_maybe pty + = fst <$> newWantedEq loc role ty1 ty2 + | otherwise + = newWantedEvVarNC loc pty + emitNewDerived :: CtLoc -> TcPredType -> TcS () emitNewDerived loc pred = do { ev <- newDerivedNC loc pred diff --git a/docs/users_guide/8.2.1-notes.rst b/docs/users_guide/8.2.1-notes.rst index 9a38299739..d934418588 100644 --- a/docs/users_guide/8.2.1-notes.rst +++ b/docs/users_guide/8.2.1-notes.rst @@ -129,6 +129,15 @@ Compiler output as JSON documents. It is experimental and will be refined depending on feedback from tooling authors for the next release. +- GHC is now able to better optimize polymorphic expressions by using known + superclass dictionaries where possible. Some examples: + + -- uses of `Monad IO` or `Applicative IO` here are improved + foo :: MonadBaseControl IO m => ... + + -- uses of `Monoid MyMonoid` here are improved + bar :: MonadWriter MyMonoid m => ... + GHCi ~~~~ diff --git a/docs/users_guide/using-optimisation.rst b/docs/users_guide/using-optimisation.rst index 48c1e6f3bb..123d74bc8f 100644 --- a/docs/users_guide/using-optimisation.rst +++ b/docs/users_guide/using-optimisation.rst @@ -599,6 +599,37 @@ list. which they are called in this module. Note that specialisation must be enabled (by ``-fspecialise``) for this to have any effect. +.. ghc-flag:: -fsolve-constant-dicts + + :default on + + When solving constraints, try to eagerly solve + super classes using availible dictionaries. + + For example:: + + class M a b where m :: a -> b + + type C a b = (Num a, M a b) + + f :: C Int b => b -> Int -> Int + f _ x = x + 1 + + The body of `f` requires a `Num Int` instance. We could solve this + constraint from the context because we have `C Int b` and that provides us + a + solution for `Num Int`. However, we can often produce much better code + by directly solving for an availible `Num Int` dictionary we might have at + hand. This removes potentially many layers of indirection and crucially + allows other optimisations to fire as the dictionary will be statically + known and selector functions can be inlined. + + The optimisation also works for GADTs which bind dictionaries. If we + statically know which class dictionary we need then we will solve it + directly rather than indirectly using the one passed in at run time. + + + .. ghc-flag:: -fstatic-argument-transformation Turn on the static argument transformation, which turns a recursive diff --git a/testsuite/tests/perf/should_run/T12791.hs b/testsuite/tests/perf/should_run/T12791.hs new file mode 100644 index 0000000000..94ebee77ba --- /dev/null +++ b/testsuite/tests/perf/should_run/T12791.hs @@ -0,0 +1,15 @@ +{-# LANGUAGE MultiParamTypeClasses, FlexibleContexts #-} +module Main ( main ) where + +class Foldable f => C f a where + m :: f a -> a + +instance C [] Int where + m = foldr (+) 0 + +{-# NOINLINE go #-} +go :: C [] b => b -> Int -> Int +go _ i = foldr (+) 0 [1..i] + +main :: IO () +main = print $ go (0 :: Int) 20000 diff --git a/testsuite/tests/perf/should_run/T12791.stdout b/testsuite/tests/perf/should_run/T12791.stdout new file mode 100644 index 0000000000..86d7234cf8 --- /dev/null +++ b/testsuite/tests/perf/should_run/T12791.stdout @@ -0,0 +1 @@ +200010000 diff --git a/testsuite/tests/perf/should_run/T5835.hs b/testsuite/tests/perf/should_run/T5835.hs new file mode 100644 index 0000000000..25787cb137 --- /dev/null +++ b/testsuite/tests/perf/should_run/T5835.hs @@ -0,0 +1,11 @@ +{-# LANGUAGE GADTs #-} +module Main where + +data T t a where + T :: (Foldable t, Eq a) => t a -> T t a + +{-# NOINLINE go #-} +go :: T [] a -> Int -> Int +go (T _) i = foldr (+) 0 [1..i] + +main = print (go (T [1::Int]) 20000) diff --git a/testsuite/tests/perf/should_run/T5835.stdout b/testsuite/tests/perf/should_run/T5835.stdout new file mode 100644 index 0000000000..86d7234cf8 --- /dev/null +++ b/testsuite/tests/perf/should_run/T5835.stdout @@ -0,0 +1 @@ +200010000 diff --git a/testsuite/tests/perf/should_run/all.T b/testsuite/tests/perf/should_run/all.T index be262ba1a9..382c317a9b 100644 --- a/testsuite/tests/perf/should_run/all.T +++ b/testsuite/tests/perf/should_run/all.T @@ -1,3 +1,23 @@ +# T12791 and T5835 test that GHC uses top-level instances in places where using +# a locally given solution would produce worse code. +# See Note [Solving from instances when interacting Dicts] + +test('T5835', + [stats_num_field('max_bytes_used', + [(wordsize(64), 44312, 10)]), + only_ways(['normal']) + ], + compile_and_run, + ['-O']) + +test('T12791', + [stats_num_field('max_bytes_used', + [(wordsize(64), 44312, 10)]), + only_ways(['normal']) + ], + compile_and_run, + ['-O']) + # Tests that newArray/newArray_ is being optimised correctly test('T10359', @@ -346,7 +366,7 @@ test('T7954', ['-O']) test('T7850', - [stats_num_field('peak_megabytes_allocated', + [stats_num_field('peak_megabytes_allocated', [(wordsize(32), 2, 10), (wordsize(64), 4, 10)]), only_ways(['normal'])], @@ -377,7 +397,7 @@ test('T4267', test('T7619', [stats_num_field('bytes allocated', [ (wordsize(32), 36012, 10) - # 32-bit close to 64-bit value; most of this very + # 32-bit close to 64-bit value; most of this very # small number is standard start-up boilerplate I think , (wordsize(64), 40992, 10) ]), # previously, it was >400000 bytes @@ -397,7 +417,7 @@ test('InlineByteArrayAlloc', [stats_num_field('bytes allocated', [ (wordsize(32), 1360036012, 5) , (wordsize(64), 1440040960, 5) ]), - # 32 and 64 bit not so different, because + # 32 and 64 bit not so different, because # we are allocating *byte* arrays only_ways(['normal'])], compile_and_run, @@ -405,7 +425,7 @@ test('InlineByteArrayAlloc', test('InlineCloneArrayAlloc', [stats_num_field('bytes allocated', - [ (wordsize(32), 800041120, 5) + [ (wordsize(32), 800041120, 5) , (wordsize(64), 1600041120, 5) ]), only_ways(['normal'])], compile_and_run, |