diff options
author | Simon Peyton Jones <simon.peytonjones@gmail.com> | 2023-02-21 10:51:34 +0000 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2023-02-28 18:54:59 -0500 |
commit | 7192ef91c855e1fae6997f75cfde76aafd0b4bcf (patch) | |
tree | aef67a692c95e4e11b50d855ba651784eb89c109 | |
parent | 239202a2b14714740e016d7bbcd4f351356fcb00 (diff) | |
download | haskell-7192ef91c855e1fae6997f75cfde76aafd0b4bcf.tar.gz |
Take more care with unlifted bindings in the specialiser
As #22998 showed, we were floating an unlifted binding to top
level, which breaks a Core invariant.
The fix is easy, albeit a little bit conservative. See
Note [Care with unlifted bindings] in GHC.Core.Opt.Specialise
-rw-r--r-- | compiler/GHC/Core.hs | 113 | ||||
-rw-r--r-- | compiler/GHC/Core/Opt/Specialise.hs | 29 | ||||
-rw-r--r-- | testsuite/tests/simplCore/should_run/T22998.hs | 10 | ||||
-rw-r--r-- | testsuite/tests/simplCore/should_run/T22998.stdout | 1 | ||||
-rw-r--r-- | testsuite/tests/simplCore/should_run/all.T | 2 |
5 files changed, 102 insertions, 53 deletions
diff --git a/compiler/GHC/Core.hs b/compiler/GHC/Core.hs index db332b421c..a92252a61c 100644 --- a/compiler/GHC/Core.hs +++ b/compiler/GHC/Core.hs @@ -366,68 +366,32 @@ a Coercion, (sym c). Note [Core letrec invariant] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The right hand sides of all top-level and recursive @let@s -/must/ be of lifted type (see "Type#type_classification" for -the meaning of /lifted/ vs. /unlifted/). +The Core letrec invariant: -There is one exception to this rule, top-level @let@s are -allowed to bind primitive string literals: see -Note [Core top-level string literals]. + The right hand sides of all + /top-level/ or /recursive/ + bindings must be of lifted type -Note [Core top-level string literals] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As an exception to the usual rule that top-level binders must be lifted, -we allow binding primitive string literals (of type Addr#) of type Addr# at the -top level. This allows us to share string literals earlier in the pipeline and -crucially allows other optimizations in the Core2Core pipeline to fire. -Consider, + There is one exception to this rule, top-level @let@s are + allowed to bind primitive string literals: see + Note [Core top-level string literals]. - f n = let a::Addr# = "foo"# - in \x -> blah +See "Type#type_classification" in GHC.Core.Type +for the meaning of "lifted" vs. "unlifted"). -In order to be able to inline `f`, we would like to float `a` to the top. -Another option would be to inline `a`, but that would lead to duplicating string -literals, which we want to avoid. See #8472. - -The solution is simply to allow top-level unlifted binders. We can't allow -arbitrary unlifted expression at the top-level though, unlifted binders cannot -be thunks, so we just allow string literals. - -We allow the top-level primitive string literals to be wrapped in Ticks -in the same way they can be wrapped when nested in an expression. -CoreToSTG currently discards Ticks around top-level primitive string literals. -See #14779. - -Also see Note [Compilation plan for top-level string literals]. - -Note [Compilation plan for top-level string literals] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Here is a summary on how top-level string literals are handled by various -parts of the compilation pipeline. - -* In the source language, there is no way to bind a primitive string literal - at the top level. - -* In Core, we have a special rule that permits top-level Addr# bindings. See - Note [Core top-level string literals]. Core-to-core passes may introduce - new top-level string literals. - -* In STG, top-level string literals are explicitly represented in the syntax - tree. - -* A top-level string literal may end up exported from a module. In this case, - in the object file, the content of the exported literal is given a label with - the _bytes suffix. +For the non-top-level, non-recursive case see Note [Core let-can-float invariant]. Note [Core let-can-float invariant] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The let-can-float invariant: - The right hand side of a non-recursive 'Let' - /may/ be of unlifted type, but only if + The right hand side of a /non-top-level/, /non-recursive/ binding + may be of unlifted type, but only if the expression is ok-for-speculation or the 'Let' is for a join point. + (For top-level or recursive lets see Note [Core letrec invariant].) + This means that the let can be floated around without difficulty. For example, this is OK: @@ -466,6 +430,53 @@ we need to allow lots of things in the arguments of a call. TL;DR: we relaxed the let/app invariant to become the let-can-float invariant. +Note [Core top-level string literals] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +As an exception to the usual rule that top-level binders must be lifted, +we allow binding primitive string literals (of type Addr#) of type Addr# at the +top level. This allows us to share string literals earlier in the pipeline and +crucially allows other optimizations in the Core2Core pipeline to fire. +Consider, + + f n = let a::Addr# = "foo"# + in \x -> blah + +In order to be able to inline `f`, we would like to float `a` to the top. +Another option would be to inline `a`, but that would lead to duplicating string +literals, which we want to avoid. See #8472. + +The solution is simply to allow top-level unlifted binders. We can't allow +arbitrary unlifted expression at the top-level though, unlifted binders cannot +be thunks, so we just allow string literals. + +We allow the top-level primitive string literals to be wrapped in Ticks +in the same way they can be wrapped when nested in an expression. +CoreToSTG currently discards Ticks around top-level primitive string literals. +See #14779. + +Also see Note [Compilation plan for top-level string literals]. + +Note [Compilation plan for top-level string literals] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Here is a summary on how top-level string literals are handled by various +parts of the compilation pipeline. + +* In the source language, there is no way to bind a primitive string literal + at the top level. + +* In Core, we have a special rule that permits top-level Addr# bindings. See + Note [Core top-level string literals]. Core-to-core passes may introduce + new top-level string literals. + + See GHC.Core.Utils.exprIsTopLevelBindable, and exprIsTickedString + +* In STG, top-level string literals are explicitly represented in the syntax + tree. + +* A top-level string literal may end up exported from a module. In this case, + in the object file, the content of the exported literal is given a label with + the _bytes suffix. + Note [NON-BOTTOM-DICTS invariant] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is a global invariant (not checkable by Lint) that diff --git a/compiler/GHC/Core/Opt/Specialise.hs b/compiler/GHC/Core/Opt/Specialise.hs index f028bd428e..f48aeb50d7 100644 --- a/compiler/GHC/Core/Opt/Specialise.hs +++ b/compiler/GHC/Core/Opt/Specialise.hs @@ -27,7 +27,7 @@ import GHC.Core import GHC.Core.Make ( mkLitRubbish ) import GHC.Core.Unify ( tcMatchTy ) import GHC.Core.Rules -import GHC.Core.Utils ( exprIsTrivial +import GHC.Core.Utils ( exprIsTrivial, exprIsTopLevelBindable , mkCast, exprType , stripTicksTop, mkInScopeSetBndrs ) import GHC.Core.FVs @@ -1515,7 +1515,10 @@ specBind top_lvl env (NonRec fn rhs) do_body = [mkDB $ NonRec b r | (b,r) <- pairs] ++ fromOL dump_dbs - ; if float_all then + can_float_this_one = exprIsTopLevelBindable rhs (idType fn) + -- exprIsTopLevelBindable: see Note [Care with unlifted bindings] + + ; if float_all && can_float_this_one then -- Rather than discard the calls mentioning the bound variables -- we float this (dictionary) binding along with the others return ([], body', all_free_uds `snocDictBinds` final_binds) @@ -1876,6 +1879,28 @@ even in the case that @x = False@! Instead, we add a dummy 'Void#' argument to the specialisation '$sfInt' ($sfInt :: Void# -> Array# Int) in order to preserve laziness. +Note [Care with unlifted bindings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider (#22998) + f x = let x::ByteArray# = <some literal> + n::Natural = NB x + in wombat @192827 (n |> co) +where + co :: Natural ~ KnownNat 192827 + wombat :: forall (n:Nat). KnownNat n => blah + +Left to itself, the specialiser would float the bindings for `x` and `n` to top +level, so we can specialise `wombat`. But we can't have a top-level ByteArray# +(see Note [Core letrec invariant] in GHC.Core). Boo. + +This is pretty exotic, so we take a simple way out: in specBind (the NonRec +case) do not float the binding itself unless it satisfies exprIsTopLevelBindable. +This is conservative: maybe the RHS of `x` has a free var that would stop it +floating to top level anyway; but that is hard to spot (since we don't know what +the non-top-level in-scope binders are) and rare (since the binding must satisfy +Note [Core let-can-float invariant] in GHC.Core). + + Note [Specialising Calls] ~~~~~~~~~~~~~~~~~~~~~~~~~ Suppose we have a function with a complicated type: diff --git a/testsuite/tests/simplCore/should_run/T22998.hs b/testsuite/tests/simplCore/should_run/T22998.hs new file mode 100644 index 0000000000..459f576d82 --- /dev/null +++ b/testsuite/tests/simplCore/should_run/T22998.hs @@ -0,0 +1,10 @@ +{-# LANGUAGE DataKinds #-} +module Main where + +import Data.Proxy (Proxy(Proxy)) +import GHC.TypeLits (natVal) + +main :: IO () +main = print x + where + x = natVal @18446744073709551616 Proxy + natVal @18446744073709551616 Proxy diff --git a/testsuite/tests/simplCore/should_run/T22998.stdout b/testsuite/tests/simplCore/should_run/T22998.stdout new file mode 100644 index 0000000000..1ce484120a --- /dev/null +++ b/testsuite/tests/simplCore/should_run/T22998.stdout @@ -0,0 +1 @@ +36893488147419103232 diff --git a/testsuite/tests/simplCore/should_run/all.T b/testsuite/tests/simplCore/should_run/all.T index 527f44a1bc..928bf89df9 100644 --- a/testsuite/tests/simplCore/should_run/all.T +++ b/testsuite/tests/simplCore/should_run/all.T @@ -108,3 +108,5 @@ test('T21575', normal, compile_and_run, ['-O']) test('T21575b', [], multimod_compile_and_run, ['T21575b', '-O']) test('T20836', normal, compile_and_run, ['-O0']) # Should not time out; See #20836 test('T22448', normal, compile_and_run, ['-O1']) +test('T22998', normal, compile_and_run, ['-O0 -fspecialise -dcore-lint']) + |