delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Demand: Simplify `CU(U)` to `U` (#19005)	Sebastian Graf	2020-12-12	2	-4/+4
\| \| \| \| \| \|	Both sub-demands encode the same information. This is a trivial change and already affects a few regression tests (e.g. `T5075`), so no separate regression test is necessary.
*	DmdAnal: Annotate top-level function bindings with demands (#18894)	Sebastian Graf	2020-12-12	5	-0/+643
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's useful to annotate a non-exported top-level function like `g` in ```hs module Lib (h) where g :: Int -> Int -> (Int,Int) g m 1 = (m, 0) g m n = (2 * m, 2 `div` n) {-# NOINLINE g #-} h :: Int -> Int h 1 = 0 h m \| odd m = snd (g m 2) \| otherwise = uncurry (+) (g 2 m) ``` with its demand `UCU(CS(P(1P(U),SP(U))`, which tells us that whenever `g` was called, the second component of the returned pair was evaluated strictly. Since #18903 we do so for local functions, where we can see all calls. For top-level functions, we can assume that all exported functions are demanded according to `topDmd` and thus get sound demands for non-exported top-level functions. The demand on `g` is crucial information for Nested CPR, which may the go on and unbox `g` for the second pair component. That is true even if that pair component may diverge, as is the case for the call site `g 13 0`, which throws a div-by-zero exception. In `T18894b`, you can even see the new demand annotation enabling us to eta-expand a function that we wouldn't be able to eta-expand without Call Arity. We only track bindings of function type in order not to risk huge compile-time regressions, see `isInterestingTopLevelFn`. There was a CoreLint check that rejected strict demand annotations on recursive or top-level bindings, which seems completely unjustified. All the cases I investigated were fine, so I removed it. Fixes #18894.
*	Demand: Interleave usage and strictness demands (#18903)	Sebastian Graf	2020-11-20	23	-76/+267
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As outlined in #18903, interleaving usage and strictness demands not only means a more compact demand representation, but also allows us to express demands that we weren't easily able to express before. Call demands are relative in the sense that a call demand `Cn(cd)` on `g` says "`g` is called `n` times. Whenever `g` is called, the result is used according to `cd`". Example from #18903: ```hs h :: Int -> Int h m = let g :: Int -> (Int,Int) g 1 = (m, 0) g n = (2 * n, 2 `div` n) {-# NOINLINE g #-} in case m of 1 -> 0 2 -> snd (g m) _ -> uncurry (+) (g m) ``` Without the interleaved representation, we would just get `L` for the strictness demand on `g`. Now we are able to express that whenever `g` is called, its second component is used strictly in denoting `g` by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the division, for example. Fixes #18903. While fixing regressions, I also discovered and fixed #18957. Metric Decrease: T13253-spj
*	Arity: Record arity types for non-recursive lets	Sebastian Graf	2020-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In #18793, we saw a compelling example which requires us to look at non-recursive let-bindings during arity analysis and unleash their arity types at use sites. After the refactoring in the previous patch, the needed change is quite simple and very local to `arityType`'s defn for non-recurisve `Let`. Apart from that, we had to get rid of the second item of `Note [Dealing with bottoms]`, which was entirely a safety measure and hindered optimistic fixed-point iteration. Fixes #18793. The following metric increases are all caused by this commit and a result of the fact that we just do more work now: Metric Increase: T3294 T12545 T12707
*	Fix some missed opportunities for preInlineUnconditionally	Simon Peyton Jones	2020-10-14	3	-291/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two signficant changes here: * Ticket #18815 showed that we were missing some opportunities for preInlineUnconditionally. The one-line fix is in the code for GHC.Core.Opt.Simplify.Utils.preInlineUnconditionally, which now switches off only for INLINE pragmas. I expanded Note [Stable unfoldings and preInlineUnconditionally] to explain. * When doing this I discovered a way in which preInlineUnconditionally was occasionally /too/ eager. It's all explained in Note [Occurrences in stable unfoldings] in GHC.Core.Opt.OccurAnal, and the one-line change adding markAllMany to occAnalUnfolding. I also got confused about what NoUserInline meant, so I've renamed it to NoUserInlinePrag, and changed its pretty-printing slightly. That led to soem error messate wibbling, and touches quite a few files, but there is no change in functionality. I did a nofib run. As expected, no significant changes. Program Size Allocs ---------------------------------------- sphere -0.0% -0.4% ---------------------------------------- Min -0.0% -0.4% Max -0.0% +0.0% Geometric Mean -0.0% -0.0% I'm allowing a max-residency increase for T10370, which seems very irreproducible. (See comments on !4241.) There is always sampling error for max-residency measurements; and in any case the change shows up on some platforms but not others. Metric Increase: T10370
*	New linear types syntax: a %p -> b (#18459)	Vladislav Zavialov	2020-09-29	1	-1/+1
\| \| \| \| \| \|	Implements GHC Proposal #356 Updates the haddock submodule.
*	Better eta-expansion (again) and don't specilise DFuns	Simon Peyton Jones	2020-09-22	1	-21/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes #18223, which made GHC generate an exponential amount of code. There are three quite separate changes in here 1. Re-engineer eta-expansion (again). The eta-expander was generating lots of intermediate stuff, which could be optimised away, but which choked the simplifier meanwhile. Relatively easy to kill it off at source. See Note [The EtaInfo mechanism] in GHC.Core.Opt.Arity. The main new thing is the use of pushCoArg in getArg_maybe. 2. Stop Specialise specalising DFuns. This is the cause of a huge (and utterly unnecessary) blowup in program size in #18223. See Note [Do not specialise DFuns] in GHC.Core.Opt.Specialise. I also refactored the Specialise monad a bit... it was silly, because it passed on unchanging values as if they were mutable state. 3. Do an extra Simplifer run, after SpecConstra and before late-Specialise. I found (investigating perf/compiler/T16473) that failing to do this was crippling both SpecConstr and Specialise. See Note [Simplify after SpecConstr] in GHC.Core.Opt.Pipeline. This change does mean an extra run of the Simplifier, but only with -O2, and I think that's acceptable. T16473 allocates three times less with this change. (I changed it to check runtime rather than compile time.) Some smaller consequences * I moved pushCoercion, pushCoArg and friends from SimpleOpt to Arity, because it was needed by the new etaInfoApp. And pushCoValArg now returns a MCoercion rather than Coercion for the argument Coercion. * A minor, incidental improvement to Core pretty-printing This does fix #18223, (which was otherwise uncompilable. Hooray. But there is still a big intermediate because there are some very deeply nested types in that program. Modest reductions in compile-time allocation on a couple of benchmarks T12425 -2.0% T13253 -10.3% Metric increase with -O2, due to extra simplifier run T9233 +5.8% T12227 +1.8% T15630 +5.0% There is a spurious apparent increase on heap residency on T9630, on some architectures at least. I tried it with -G1 and the residency is essentially unchanged. Metric Increase T9233 T12227 T9630 Metric Decrease T12425 T13253
*	This patch addresses the exponential blow-up in the simplifier.	Simon Peyton Jones	2020-07-28	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specifically: #13253 exponential inlining #10421 ditto #18140 strict constructors #18282 another nested-function call case This patch makes one really significant changes: change the way that mkDupableCont handles StrictArg. The details are explained in GHC.Core.Opt.Simplify Note [Duplicating StrictArg]. Specific changes * In mkDupableCont, when making auxiliary bindings for the other arguments of a call, add extra plumbing so that we don't forget the demand on them. Otherwise we haev to wait for another round of strictness analysis. But actually all the info is to hand. This change affects: - Make the strictness list in ArgInfo be [Demand] instead of [Bool], and rename it to ai_dmds. - Add as_dmd to ValArg - Simplify.makeTrivial takes a Demand - mkDupableContWithDmds takes a [Demand] There are a number of other small changes 1. For Ids that are used at most once in each branch of a case, make the occurrence analyser record the total number of syntactic occurrences. Previously we recorded just OneBranch or MultipleBranches. I thought this was going to be useful, but I ended up barely using it; see Note [Note [Suppress exponential blowup] in GHC.Core.Opt.Simplify.Utils Actual changes: * See the occ_n_br field of OneOcc. * postInlineUnconditionally 2. I found a small perf buglet in SetLevels; see the new function GHC.Core.Opt.SetLevels.hasFreeJoin 3. Remove the sc_cci field of StrictArg. I found I could get its information from the sc_fun field instead. Less to get wrong! 4. In ArgInfo, arrange that ai_dmds and ai_discs have a simpler invariant: they line up with the value arguments beyond ai_args This allowed a bit of nice refactoring; see isStrictArgInfo, lazyArgcontext, strictArgContext There is virtually no difference in nofib. (The runtime numbers are bogus -- I tried a few manually.) Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- fft +0.0% -2.0% -48.3% -49.4% 0.0% multiplier +0.0% -2.2% -50.3% -50.9% 0.0% -------------------------------------------------------------------------------- Min -0.4% -2.2% -59.2% -60.4% 0.0% Max +0.0% +0.1% +3.3% +4.9% 0.0% Geometric Mean +0.0% -0.0% -33.2% -34.3% -0.0% Test T18282 is an existing example of these deeply-nested strict calls. We get a big decrease in compile time (-85%) because so much less inlining takes place. Metric Decrease: T18282
*	Update testsuite	Sylvain Henry	2020-06-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* support detection of slow ghc-bignum backend (to replace the detection of integer-simple use). There are still some test cases that the native backend doesn't handle efficiently enough. * remove tests for GMP only functions that have been removed from ghc-bignum * fix test results showing dependent packages (e.g. integer-gmp) or showing suggested instances * fix test using Integer/Natural API or showing internal names
*	Linear types (#15981)	Krzysztof Gogolewski	2020-06-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the first step towards implementation of the linear types proposal (https://github.com/ghc-proposals/ghc-proposals/pull/111). It features * A language extension -XLinearTypes * Syntax for linear functions in the surface language * Linearity checking in Core Lint, enabled with -dlinear-core-lint * Core-to-core passes are mostly compatible with linearity * Fields in a data type can be linear or unrestricted; linear fields have multiplicity-polymorphic constructors. If -XLinearTypes is disabled, the GADT syntax defaults to linear fields The following items are not yet supported: * a # m -> b syntax (only prefix FUN is supported for now) * Full multiplicity inference (multiplicities are really only checked) * Decent linearity error messages * Linear let, where, and case expressions in the surface language (each of these currently introduce the unrestricted variant) * Multiplicity-parametric fields * Syntax for annotating lambda-bound or let-bound with a multiplicity * Syntax for non-linear/multiple-field-multiplicity records * Linear projections for records with a single linear field * Linear pattern synonyms * Multiplicity coercions (test LinearPolyType) A high-level description can be found at https://ghc.haskell.org/trac/ghc/wiki/LinearTypes/Implementation Following the link above you will find a description of the changes made to Core. This commit has been authored by * Richard Eisenberg * Krzysztof Gogolewski * Matthew Pickering * Arnaud Spiwack With contributions from: * Mark Barbone * Alexander Vershilov Updates haddock submodule.
*	Implement cast worker/wrapper properly	Simon Peyton Jones	2020-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cast worker/wrapper transformation transforms x = e \|> co into y = e x = y \|> co This is done by the simplifier, but we were being careless about transferring IdInfo from x to y, and about what to do if x is a NOINLNE function. This resulted in a series of bugs: #17673, #18093, #18078. This patch fixes all that: * Main change is in GHC.Core.Opt.Simplify, and the new prepareBinding function, which does this cast worker/wrapper transform. See Note [Cast worker/wrappers]. * There is quite a bit of refactoring around prepareRhs, makeTrivial etc. It's nicer now. * Some wrappers from strictness and cast w/w, notably those for a function with a NOINLINE, should inline very late. There wasn't really a mechanism for that, which was an existing bug really; so I invented a new finalPhase = Phase (-1). It's used for all simplifier runs after the user-visible phase 2,1,0 have run. (No new runs of the simplifier are introduced thereby.) See new Note [Compiler phases] in GHC.Types.Basic; the main changes are in GHC.Core.Opt.Driver * Doing this made me trip over two places where the AnonArgFlag on a FunTy was being lost so we could end up with (Num a -> ty) rather than (Num a => ty) - In coercionLKind/coercionRKind - In contHoleType in the Simplifier I fixed the former by defining mkFunctionType and using it in coercionLKind/RKind. I could have done the same for the latter, but the information is almost to hand. So I fixed the latter by - adding sc_hole_ty to ApplyToVal (like ApplyToTy), - adding as_hole_ty to ValArg (like TyArg) - adding sc_fun_ty to StrictArg Turned out I could then remove ai_type from ArgInfo. This is just moving the deck chairs around, but it worked out nicely. See the new Note [AnonArgFlag] in GHC.Types.Var * When looking at the 'arity decrease' thing (#18093) I discovered that stable unfoldings had a much lower arity than the actual optimised function. That's what led to the arity-decrease message. Simple solution: eta-expand. It's described in Note [Eta-expand stable unfoldings] in GHC.Core.Opt.Simplify * I also discovered that unsafeCoerce wasn't being inlined if the context was boring. So (\x. f (unsafeCoerce x)) would create a thunk -- yikes! I fixed that by making inlineBoringOK a bit cleverer: see Note [Inline unsafeCoerce] in GHC.Core.Unfold. I also found that unsafeCoerceName was unused, so I removed it. I made a test case for #18078, and a very similar one for #17673. The net effect of all this on nofib is very modest, but positive: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- anna -0.4% -0.1% -3.1% -3.1% 0.0% fannkuch-redux -0.4% -0.3% -0.1% -0.1% 0.0% maillist -0.4% -0.1% -7.8% -1.0% -14.3% primetest -0.4% -15.6% -7.1% -6.6% 0.0% -------------------------------------------------------------------------------- Min -0.9% -15.6% -13.3% -14.2% -14.3% Max -0.3% 0.0% +12.1% +12.4% 0.0% Geometric Mean -0.4% -0.2% -2.3% -2.2% -0.1% All following metric decreases are compile-time allocation decreases between -1% and -3%: Metric Decrease: T5631 T13701 T14697 T15164
*	DmdAnal: Recognise precise exceptions from case alternatives (#18086)	Sebastian Graf	2020-05-28	3	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider ```hs m :: IO () m = do putStrLn "foo" error "bar" ``` `m` (from #18086) always throws a (precise or imprecise) exception or diverges. Yet demand analysis infers `<L,A>` as demand signature instead of `<L,A>x` for it. That's because the demand analyser sees `putStrLn` occuring in a case scrutinee and decides that it has to `deferAfterPreciseException`, because `putStrLn` throws a precise exception on some control flow paths. This will mask the `botDiv` `Divergence`of the single case alt containing `error` to `topDiv`. Since `putStrLn` has `topDiv` itself, the final `Divergence` is `topDiv`. This is easily fixed: `deferAfterPreciseException` works by `lub`ing with the demand type of a virtual case branch denoting the precise exceptional control flow. We used `nopDmdType` before, but we can be more precise and use `exnDmdType`, which is `nopDmdType` with `exnDiv`. Now the `Divergence` from the case alt will degrade `botDiv` to `exnDiv` instead of `topDiv`, which combines with the result from the scrutinee to `exnDiv`, and all is well. Fixes #18086.
*	Make WorkWrap.Lib.isWorkerSmallEnough aware of the old arity	Sebastian Graf	2020-05-26	3	-0/+92
\| \| \| \| \| \| \| \| \| \|	We should allow a wrapper with up to 82 parameters when the original function had 82 parameters to begin with. I verified that this made no difference on NoFib, but then again it doesn't use huge records... Fixes #18122.
*	DmdAnal: Improve handling of precise exceptions	Sebastian Graf	2020-05-15	15	-113/+275
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch does two things: Fix possible unsoundness in what was called the "IO hack" and implement part 2.1 of the "fixing precise exceptions" plan in https://gitlab.haskell.org/ghc/ghc/wikis/fixing-precise-exceptions, which, in combination with !2956, supersedes !3014 and !2525. IO hack The "IO hack" (which is a fallback to preserve precise exceptions semantics and thus soundness, rather than some smart thing that increases precision) is called `exprMayThrowPreciseException` now. I came up with two testcases exemplifying possible unsoundness (if twisted enough) in the old approach: - `T13380d`: Demonstrating unsoundness of the "IO hack" when resorting to manual state token threading and direct use of primops. More details below. - `T13380e`: Demonstrating unsoundness of the "IO hack" when we have Nested CPR. Not currently relevant, as we don't have Nested CPR yet. - `T13380f`: Demonstrating unsoundness of the "IO hack" for safe FFI calls. Basically, the IO hack assumed that precise exceptions can only be thrown from a case scrutinee of type `(# State# RealWorld, _ #)`. I couldn't come up with a program using the `IO` abstraction that violates this assumption. But it's easy to do so via manual state token threading and direct use of primops, see `T13380d`. Also similar code might be generated by Nested CPR in the (hopefully not too) distant future, see `T13380e`. Hence, we now have a more careful test in `forcesRealWorld` that passes `T13380{d,e}` (and will hopefully be robust to Nested CPR). Precise exceptions In #13380 and #17676 we saw that we didn't preserve precise exception semantics in demand analysis. We fixed that with minimal changes in !2956, but that was terribly unprincipled. That unprincipledness resulted in a loss of precision, which is tracked by these new test cases: - `T13380b`: Regression in dead code elimination, because !2956 was too syntactic about `raiseIO#` - `T13380c`: No need to apply the "IO hack" when the IO action may not throw a precise exception (and the existing IO hack doesn't detect that) Fixing both issues in !3014 turned out to be too complicated and had the potential to regress in the future. Hence we decided to only fix `T13380b` and augment the `Divergence` lattice with a new middle-layer element, `ExnOrDiv`, which means either `Diverges` (, throws an imprecise exception) or throws a precise exception. See the wiki page on Step 2.1 for more implementational details: https://gitlab.haskell.org/ghc/ghc/wikis/fixing-precise-exceptions#dead-code-elimination-for-raiseio-with-isdeadenddiv-introducing-exnordiv-step-21
*	CprAnal: Don't attach CPR sigs to expandable bindings (#18154)	Sebastian Graf	2020-05-13	14	-58/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, look through expandable unfoldings in `cprTransform`. See the new Note [CPR for expandable unfoldings]: ``` Long static data structures (whether top-level or not) like xs = x1 : xs1 xs1 = x2 : xs2 xs2 = x3 : xs3 should not get CPR signatures, because they * Never get WW'd, so their CPR signature should be irrelevant after analysis (in fact the signature might even be harmful for that reason) * Would need to be inlined/expanded to see their constructed product * Recording CPR on them blows up interface file sizes and is redundant with their unfolding. In case of Nested CPR, this blow-up can be quadratic! But we can't just stop giving DataCon application bindings the CPR property, for example fac 0 = 1 fac n = n * fac (n-1) fac certainly has the CPR property and should be WW'd! But FloatOut will transform the first clause to lvl = 1 fac 0 = lvl If lvl doesn't have the CPR property, fac won't either. But lvl doesn't have a CPR signature to extrapolate into a CPR transformer ('cprTransform'). So instead we keep on cprAnal'ing through expandable unfoldings for these arity 0 bindings via 'cprExpandUnfolding_maybe'. In practice, GHC generates a lot of (nested) TyCon and KindRep bindings, one for each data declaration. It's wasteful to attach CPR signatures to each of them (and intractable in case of Nested CPR). ``` Fixes #18154.
*	Unit: split and rename modules	Sylvain Henry	2020-04-30	5	-10/+10
\| \| \| \| \| \| \|	Introduce GHC.Unit.* hierarchy for everything concerning units, packages and modules. Update Haddock submodule
*	Mark DataCon wrappers CONLIKE	Alexis King	2020-04-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that DataCon wrappers don’t inline until phase 0 (see commit b78cc64e923716ac0512c299f42d4d0012306c05), it’s important that case-of-known-constructor and RULE matching be able to see saturated applications of DataCon wrappers in unfoldings. Making them conlike is a natural way to do it, since they are, in fact, precisely the sort of thing the CONLIKE pragma exists to solve. Fixes #18012. This also bumps the version of the parsec submodule to incorporate a patch that avoids a metric increase on the haddock perf tests. The increase was not really a flaw in this patch, as parsec was implicitly relying on inlining heuristics. The patch to parsec just adds some INLINABLE pragmas, and we get a nice performance bump out of it (well beyond the performance we lost from this patch). Metric Decrease: T12234 WWRec haddock.Cabal haddock.base haddock.compiler
*	GHC.Core.Opt renaming	Sylvain Henry	2020-04-18	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	* GHC.Core.Op => GHC.Core.Opt * GHC.Core.Opt.Simplify.Driver => GHC.Core.Opt.Driver * GHC.Core.Opt.Tidy => GHC.Core.Tidy * GHC.Core.Opt.WorkWrap.Lib => GHC.Core.Opt.WorkWrap.Utils As discussed in: * https://mail.haskell.org/pipermail/ghc-devs/2020-April/018758.html * https://gitlab.haskell.org/ghc/ghc/issues/13009#note_264650
*	Preserve precise exceptions in strictness analysis	Sebastian Graf	2020-04-02	2	-1/+20
\| \| \| \| \| \| \| \| \| \| \|	Fix #13380 and #17676 by 1. Changing `raiseIO#` to have `topDiv` instead of `botDiv` 2. Give it special treatment in `Simplifier.Util.mkArgInfo`, treating it as if it still had `botDiv`, to recover dead code elimination. This is the first commit of the plan outlined in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/2525#note_260886.
*	Demand analysis: simplify the demand for a RHS	Simon Peyton Jones	2020-03-29	8	-336/+325
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ticket #17932 showed that we were using a stupid demand for the RHS of a let-binding, when the result is a product. This was the result of a "fix" in 2013, which (happily) turns out to no longer be necessary. So I just deleted the code, which simplifies the demand analyser, and fixes #17932. That in turn uncovered that the anticipation of worker/wrapper in CPR analysis was inaccurate, hence the logic that decides whether to unbox an argument in WW was extracted into a function `wantToUnbox`, now consulted by CPR analysis. I tried nofib, and got 0.0% perf changes. All this came up when messing about with !2873 (ticket #17917), but is idependent of it. Unfortunately, this patch regresses #4267 and realised that it is now blocked on #16335.
*	Modules: Core operations (#13009)	Sylvain Henry	2020-03-18	2	-2/+2
\|
*	Normalize types when dropping absent arguments from workers	Alexis King	2020-02-20	3	-0/+12
\| \| \| \|	fixes #17852
*	Always display inferred variables using braces	Krzysztof Gogolewski	2020-02-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	We now always show "forall {a}. T" for inferred variables, previously this was controlled by -fprint-explicit-foralls. This implements part 1 of https://github.com/ghc-proposals/ghc-proposals/pull/179. Part of GHC ticket #16320. Furthermore, when printing a levity restriction error, we now display the HsWrap of the expression. This lets users see the full elaboration with -fprint-typechecker-elaboration (see also #17670)
*	Separate CPR analysis from the Demand analyserwip/sep-cpr	Sebastian Graf	2020-02-12	15	-95/+201
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reasons for that can be found in the wiki: https://gitlab.haskell.org/ghc/ghc/wikis/nested-cpr/split-off-cpr We now run CPR after demand analysis (except for after the final demand analysis run just before code gen). CPR got its own dump flags (`-ddump-cpr-anal`, `-ddump-cpr-signatures`), but not its own flag to activate/deactivate. It will run with `-fstrictness`/`-fworker-wrapper`. As explained on the wiki page, this step is necessary for a sane Nested CPR analysis. And it has quite positive impact on compiler performance: Metric Decrease: T9233 T9675 T9961 T15263
*	Do CafInfo/SRT analysis in Cmm	Ömer Sinan Ağacan	2020-01-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes all CafInfo predictions and various hacks to preserve predicted CafInfos from the compiler and assigns final CafInfos to interface Ids after code generation. SRT analysis is extended to support static data, and Cmm generator is modified to allow generating static_link fields after SRT analysis. This also fixes `-fcatch-bottoms`, which introduces error calls in case expressions in CorePrep, which runs after CoreTidy (which is where we decide on CafInfos) and turns previously non-CAFFY things into CAFFY. Fixes #17648 Fixes #9718 Evaluation ========== NoFib ----- Boot with: `make boot mode=fast` Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1` -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.0% 0.0% -0.0% -0.0% -0.0% CSD -0.0% 0.0% -0.0% -0.0% -0.0% FS -0.0% 0.0% -0.0% -0.0% -0.0% S -0.0% 0.0% -0.0% -0.0% -0.0% VS -0.0% 0.0% -0.0% -0.0% -0.0% VSD -0.0% 0.0% -0.0% -0.0% -0.5% VSM -0.0% 0.0% -0.0% -0.0% -0.0% anna -0.1% 0.0% -0.0% -0.0% -0.0% ansi -0.0% 0.0% -0.0% -0.0% -0.0% atom -0.0% 0.0% -0.0% -0.0% -0.0% awards -0.0% 0.0% -0.0% -0.0% -0.0% banner -0.0% 0.0% -0.0% -0.0% -0.0% bernouilli -0.0% 0.0% -0.0% -0.0% -0.0% binary-trees -0.0% 0.0% -0.0% -0.0% -0.0% boyer -0.0% 0.0% -0.0% -0.0% -0.0% boyer2 -0.0% 0.0% -0.0% -0.0% -0.0% bspt -0.0% 0.0% -0.0% -0.0% -0.0% cacheprof -0.0% 0.0% -0.0% -0.0% -0.0% calendar -0.0% 0.0% -0.0% -0.0% -0.0% cichelli -0.0% 0.0% -0.0% -0.0% -0.0% circsim -0.0% 0.0% -0.0% -0.0% -0.0% clausify -0.0% 0.0% -0.0% -0.0% -0.0% comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0% compress -0.0% 0.0% -0.0% -0.0% -0.0% compress2 -0.0% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0% cse -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0% dom-lt -0.0% 0.0% -0.0% -0.0% -0.0% eliza -0.0% 0.0% -0.0% -0.0% -0.0% event -0.0% 0.0% -0.0% -0.0% -0.0% exact-reals -0.0% 0.0% -0.0% -0.0% -0.0% exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0% expert -0.0% 0.0% -0.0% -0.0% -0.0% fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0% fasta -0.0% 0.0% -0.0% -0.0% -0.0% fem -0.0% 0.0% -0.0% -0.0% -0.0% fft -0.0% 0.0% -0.0% -0.0% -0.0% fft2 -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% fish -0.0% 0.0% -0.0% -0.0% -0.0% fluid -0.1% 0.0% -0.0% -0.0% -0.0% fulsom -0.0% 0.0% -0.0% -0.0% -0.0% gamteb -0.0% 0.0% -0.0% -0.0% -0.0% gcd -0.0% 0.0% -0.0% -0.0% -0.0% gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0% genfft -0.0% 0.0% -0.0% -0.0% -0.0% gg -0.0% 0.0% -0.0% -0.0% -0.0% grep -0.0% 0.0% -0.0% -0.0% -0.0% hidden -0.0% 0.0% -0.0% -0.0% -0.0% hpg -0.1% 0.0% -0.0% -0.0% -0.0% ida -0.0% 0.0% -0.0% -0.0% -0.0% infer -0.0% 0.0% -0.0% -0.0% -0.0% integer -0.0% 0.0% -0.0% -0.0% -0.0% integrate -0.0% 0.0% -0.0% -0.0% -0.0% k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0% kahan -0.0% 0.0% -0.0% -0.0% -0.0% knights -0.0% 0.0% -0.0% -0.0% -0.0% lambda -0.0% 0.0% -0.0% -0.0% -0.0% last-piece -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% life -0.0% 0.0% -0.0% -0.0% -0.0% lift -0.0% 0.0% -0.0% -0.0% -0.0% linear -0.1% 0.0% -0.0% -0.0% -0.0% listcompr -0.0% 0.0% -0.0% -0.0% -0.0% listcopy -0.0% 0.0% -0.0% -0.0% -0.0% maillist -0.0% 0.0% -0.0% -0.0% -0.0% mandel -0.0% 0.0% -0.0% -0.0% -0.0% mandel2 -0.0% 0.0% -0.0% -0.0% -0.0% mate -0.0% 0.0% -0.0% -0.0% -0.0% minimax -0.0% 0.0% -0.0% -0.0% -0.0% mkhprog -0.0% 0.0% -0.0% -0.0% -0.0% multiplier -0.0% 0.0% -0.0% -0.0% -0.0% n-body -0.0% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0% para -0.0% 0.0% -0.0% -0.0% -0.0% paraffins -0.0% 0.0% -0.0% -0.0% -0.0% parser -0.1% 0.0% -0.0% -0.0% -0.0% parstof -0.1% 0.0% -0.0% -0.0% -0.0% pic -0.0% 0.0% -0.0% -0.0% -0.0% pidigits -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% pretty -0.0% 0.0% -0.3% -0.4% -0.4% primes -0.0% 0.0% -0.0% -0.0% -0.0% primetest -0.0% 0.0% -0.0% -0.0% -0.0% prolog -0.0% 0.0% -0.0% -0.0% -0.0% puzzle -0.0% 0.0% -0.0% -0.0% -0.0% queens -0.0% 0.0% -0.0% -0.0% -0.0% reptile -0.0% 0.0% -0.0% -0.0% -0.0% reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0% rewrite -0.0% 0.0% -0.0% -0.0% -0.0% rfib -0.0% 0.0% -0.0% -0.0% -0.0% rsa -0.0% 0.0% -0.0% -0.0% -0.0% scc -0.0% 0.0% -0.3% -0.5% -0.4% sched -0.0% 0.0% -0.0% -0.0% -0.0% scs -0.0% 0.0% -0.0% -0.0% -0.0% simple -0.1% 0.0% -0.0% -0.0% -0.0% solid -0.0% 0.0% -0.0% -0.0% -0.0% sorting -0.0% 0.0% -0.0% -0.0% -0.0% spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0% sphere -0.0% 0.0% -0.0% -0.0% -0.0% symalg -0.0% 0.0% -0.0% -0.0% -0.0% tak -0.0% 0.0% -0.0% -0.0% -0.0% transform -0.0% 0.0% -0.0% -0.0% -0.0% treejoin -0.0% 0.0% -0.0% -0.0% -0.0% typecheck -0.0% 0.0% -0.0% -0.0% -0.0% veritas -0.0% 0.0% -0.0% -0.0% -0.0% wang -0.0% 0.0% -0.0% -0.0% -0.0% wave4main -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0% x2n1 -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.3% -0.5% -0.5% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.1% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% gc_bench -0.0% 0.0% -0.0% -0.0% -0.0% hash -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% spellcheck -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.0% -0.0% -0.0% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0% Manual inspection of programs in testsuite/tests/programs --------------------------------------------------------- I built these programs with a bunch of dump flags and `-O` and compared STG, Cmm, and Asm dumps and file sizes. (Below the numbers in parenthesis show number of modules in the program) These programs have identical compiler (same .hi and .o sizes, STG, and Cmm and Asm dumps): - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3), andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2), jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4), jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1), bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1), strict_anns (1), thurston-module-arith (2), okeefe_neural (1), joao-circular (6), 10queens (1) Programs with different compiler outputs: - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level `[Int]` closures as CAFFY for no reason. With this patch we no longer make them CAFFY and generate less SRT entries. For some reason Main.o is slightly larger with this patch (1.3%) and the executable sizes are the same. (I'd expect both to be smaller) - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the executable sizes are the same. - galois_raytrace (13): Differences are in the Parse module. There are a lot, but some of the changes are caused by the fact that for some reason (I think a bug) GHC HEAD marks the dictionary for `Functor Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the same. - north_array: We now generate less SRT entries because some of array primops used in this program like `NewArrayOp` get eliminated during Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24% larger (9224 bytes from 9000 bytes), executable sizes are the same. - seward-space-leak: Difference in this program is better shown by this smaller example: module Lib where data CDS = Case [CDS] [(Int, CDS)] \| Call CDS CDS instance Eq CDS where Case sels1 rets1 == Case sels2 rets2 = sels1 == sels2 && rets1 == rets2 Call a1 b1 == Call a2 b2 = a1 == a2 && b1 == b2 _ == _ = False In this program GHC HEAD builds a new SRT for the recursive group of `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==` in its SRT field, and `==` uses the SRT object as its SRT. With this patch we use the closure for `/=` as the SRT and add `==` there. Then `/=` gets an empty SRT field and `==` points to `/=` in its SRT field. This change looks fine to me. Main.o gets 0.07% larger, executable sizes are identical. head.hackage ------------ head.hackage's CI script builds 428 packages from Hackage using this patch with no failures. Compiler performance -------------------- The compiler perf tests report that the compiler allocates slightly more (worst case observed so far is 4%). However most programs in the test suite are small, single file programs. To benchmark compiler performance on something more realistic I build Cabal (the library, 236 modules) with different optimisation levels. For the "max residency" row I run GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows are generated with just `-s`. (This is because `-i0` causes running GC much more frequently and as a result "bytes copied" gets inflated by more than 25x in some cases) * -O0 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| -------------- \| -------------- \| ------ \| \| Bytes allocated \| 54,413,350,872 \| 54,701,099,464 \| +0.52% \| \| Bytes copied \| 4,926,037,184 \| 4,990,638,760 \| +1.31% \| \| Max residency \| 421,225,624 \| 424,324,264 \| +0.73% \| * -O1 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| --------------- \| --------------- \| ------ \| \| Bytes allocated \| 245,849,209,992 \| 246,562,088,672 \| +0.28% \| \| Bytes copied \| 26,943,452,560 \| 27,089,972,296 \| +0.54% \| \| Max residency \| 982,643,440 \| 991,663,432 \| +0.91% \| * -O2 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| --------------- \| --------------- \| ------ \| \| Bytes allocated \| 291,044,511,408 \| 291,863,910,912 \| +0.28% \| \| Bytes copied \| 37,044,237,616 \| 36,121,690,472 \| -2.49% \| \| Max residency \| 1,071,600,328 \| 1,086,396,256 \| +1.38% \| Extra compiler allocations -------------------------- Runtime allocations of programs are as reported above (NoFib section). The compiler now allocates more than before. Main source of allocation in this patch compared to base commit is the new SRT algorithm (GHC.Cmm.Info.Build). Below is some of the extra work we do with this patch, numbers generated by profiled stage 2 compiler when building a pathological case (the test 'ManyConstructors') with '-O2': - We now sort the final STG for a module, which means traversing the entire program, generating free variable set for each top-level binding, doing SCC analysis, and re-ordering the program. In ManyConstructors this step allocates 97,889,952 bytes. - We now do SRT analysis on static data, which in a program like ManyConstructors causes analysing 10,000 bindings that we would previously just skip. This step allocates 70,898,352 bytes. - We now maintain an SRT map for the entire module as we compile Cmm groups: data ModuleSRTInfo = ModuleSRTInfo { ... , moduleSRTMap :: SRTMap } (SRTMap is just a strict Map from the 'containers' library) This map gets an entry for most bindings in a module (exceptions are THUNKs and CAFFY static functions). For ManyConstructors this map gets 50015 entries. - Once we're done with code generation we generate a NameSet from SRTMap for the non-CAFFY names in the current module. This set gets the same number of entries as the SRTMap. - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using the NameSet generated in the previous step. This usually does the least amount of allocation among the work listed here. Only place with this patch where we do less work in the CAF analysis in the tidying pass (CoreTidy). However that doesn't save us much, as the pass still needs to traverse the whole program and update IdInfos for other reasons. Only thing we don't here do is the `hasCafRefs` pass over the RHS of bindings, which is a stateless pass that returns a boolean value, so it doesn't allocate much. (Metric changes blow are all increased allocations) Metric changes -------------- Metric Increase: ManyAlternatives ManyConstructors T13035 T14683 T1969 T9961
*	testsuite: Ensure that makefile tests get run	Ben Gamari	2019-10-17	2	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously `makefile_test` and `run_command` tests could easily end up in a situation where they wouldn't be run if the user used the `only_ways` modifier. The reason is to build the set of a ways to run the test in we first start with a candidate set determined by the test type (e.g. `makefile_test`, `compile_run`, etc.) and then filter that set with the constraints given by the test's modifiers. `makefile_test` and `run_command` tests' candidate sets were simply `{normal}`, and consequently most uses of `only_ways` would result in the test being never run. To avoid this we rather use all ways as the candidate sets for these test types. This may result in a few more testcases than we would like (given that some `run_command` tests are insensitive to way) but this can be fixed by adding modifiers and we would much rather run too many tests than too few. This fixes #16042 and a number of other tests afflicted by the same issue. However, there were a few cases that required special attention: * `T14028` is currently failing and is therefore marked as broken due to #17300 * `T-signals-child` is fragile in the `threaded1` and `threaded2` ways (tracked in #17307)
*	Compute demand signatures assuming idArity	Sebastian Graf	2019-04-30	3	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This does four things: 1. Look at `idArity` instead of manifest lambdas to decide whether to use LetUp 2. Compute the strictness signature in LetDown assuming at least `idArity` incoming arguments 3. Remove the special case for trivial RHSs, which is subsumed by 2 4. Don't perform the W/W split when doing so would eta expand a binding. Otherwise we would eta expand PAPs, causing unnecessary churn in the Simplifier. NoFib Results -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- fannkuch-redux +0.3% 0.0% gg -0.0% -0.1% maillist +0.2% +0.2% minimax 0.0% +0.8% pretty 0.0% -0.1% reptile -0.0% -1.2% -------------------------------------------------------------------------------- Min -0.0% -1.2% Max +0.3% +0.8% Geometric Mean +0.0% -0.0%
*	Update Trac ticket URLs to point to GitLab	Ryan Scott	2019-03-15	5	-6/+6
\| \| \| \| \|	This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
*	Always do the worker/wrapper split for NOINLINEs	Sebastian Graf	2019-03-07	3	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trac #10069 revealed that small NOINLINE functions didn't get split into worker and wrapper. This was due to `certainlyWillInline` saying that any unfoldings with a guidance of `UnfWhen` inline unconditionally. That isn't the case for NOINLINE functions, so we catch this case earlier now. Nofib results: -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- fannkuch-redux -0.3% 0.0% gg +0.0% +0.1% maillist -0.2% -0.2% minimax 0.0% -0.8% -------------------------------------------------------------------------------- Min -0.3% -0.8% Max +0.0% +0.1% Geometric Mean -0.0% -0.0% Fixes #10069. ------------------------- Metric Increase: T9233 -------------------------
*	Remove ExnStr and ThrowsExn business	Sebastian Graf	2019-02-01	3	-4/+21
\|
*	testsuite: Use makefile_test	Ben Gamari	2019-01-30	2	-4/+3
\| \| \| \| \|	This eliminates most uses of run_command in the testsuite in favor of the more structured makefile_test.
*	Revert "Batch merge"	Ben Gamari	2019-01-30	2	-3/+4
\| \| \| \|	This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
*	Batch merge	Ben Gamari	2019-01-30	2	-4/+3
\|
*	Comments in stranal test declarations	Ömer Sinan Ağacan	2019-01-18	1	-2/+4
\|
*	Add test for #16197wip/T16197	Ömer Sinan Ağacan	2019-01-17	4	-1/+39
\|
*	Improvements to demand analysis	Simon Peyton Jones	2018-12-12	7	-8/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch collects a few improvements triggered by Trac #15696, and fixing Trac #16029 * Stop making toCleanDmd behave specially for unlifted types. This special case was the cause of stupid behaviour in Trac #16029. And to my joy I discovered the let/app invariant rendered it unnecessary. (Maybe the special case pre-dated the let/app invariant.) Result: less special-case handling in the compiler, and better perf for the compiled code. * In WwLib.mkWWstr_one, treat seqDmd like U(AAA). It was not being so treated before, which again led to stupid code. * Update and improve Notes There are .stderr test wibbles because we get slightly different strictness signatures for an argumment of unlifted type: <L,U> rather than <S,U> for Int# <S,U> rather than <S(S),U(U)> for Int
*	Add RubbishLit for absent bindings of UnliftedRep	Sebastian Graf	2018-10-14	3	-0/+363
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Trac #9279 reminded us that the worker wrapper transformation copes really badly with absent unlifted boxed bindings. As `Note [Absent errors]` in WwLib.hs points out, we can't just use `absentError` for unlifted bindings because there is no bottom to hide the error in. So instead, we synthesise a new `RubbishLit` of type `forall (a :: TYPE 'UnliftedRep). a`, which code-gen may subsitute for any boxed value. We choose `()`, so that there is a good chance that the program crashes instead instead of leading to corrupt data, should absence analysis have been too optimistic (#11126). Reviewers: simonpj, hvr, goldfire, bgamari, simonmar Reviewed By: simonpj Subscribers: osa1, rwbarton, carter GHC Trac Issues: #15627, #9279, #4306, #11126 Differential Revision: https://phabricator.haskell.org/D5153
*	Inline wrappers earlier	Simon Peyton Jones	2018-04-20	2	-199/+199
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch has a single significant change: strictness wrapper functions are inlined earlier, in phase 2 rather than phase 0. As shown by Trac #15056, this gives a better chance for RULEs to fire. Before this change, a function that would have inlined early without strictness analyss was instead inlining late. Result: applying "optimisation" made the program worse. This does not make too much difference in nofib, but I've stumbled over the problem more than once, so even a "no-change" result would be quite acceptable. Here are the headlines: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- cacheprof -0.5% -0.5% +2.5% +2.5% 0.0% fulsom -1.0% +2.6% -0.1% -0.1% 0.0% mate -0.6% +2.4% -0.9% -0.9% 0.0% veritas -0.7% -23.2% 0.002 0.002 0.0% -------------------------------------------------------------------------------- Min -1.4% -23.2% -12.5% -15.3% 0.0% Max +0.6% +2.6% +4.4% +4.3% +19.0% Geometric Mean -0.7% -0.2% -1.4% -1.7% +0.2% * A worthwhile reduction in binary size. * Runtimes are not to be trusted much but look as if they are moving the right way. * A really big win in veritas, described in comment:1 of Trac #15056; more fusion rules fired. * I investigated the losses in 'mate' and 'fulsom'; see #15056.
*	Add a test case from the nested CPR work	Ben Gamari	2018-04-19	3	-0/+28
\| \| \| \| \| \| \| \| \| \|	Reviewers: bgamari Reviewed By: bgamari Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4565
*	Get evaluated-ness right in the back end	Simon Peyton Jones	2018-01-03	1	-40/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See Trac #14626, comment:4. We want to maintain evaluted-ness info on Ids into the code generateor for two reasons (see Note [Preserve evaluated-ness in CorePrep] in CorePrep) - DataToTag magic - Potentially using it in the codegen (this is Gabor's current work) But it was all being done very inconsistently, and actually outright wrong -- the DataToTag magic hasn't been working for years. This patch tidies it all up, with Notes to match.
*	Fix nasty bug in w/w for absence analysis	Simon Peyton Jones	2017-10-03	4	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This dark corner was exposed by Trac #14285. It involves the interaction between absence analysis and INLINABLE pragmas. There is a full explanation in Note [aBSENT_ERROR_ID] in MkCore, which you can read there. The changes in this patch are * Make exprIsHNF return True for absentError, treating absentError like an honorary data constructor. * Make absentError /not/ be diverging, unlike other error Ids. This is all a bit horrible. * While doing this I found that exprOkForSpeculation didn't have a case for value lambdas so I added one. It's not really called on lifted types much, but it seems like the right thing
*	mkDataConRep: fix bug in strictness signature (#14290)	Simon Marlow	2017-09-28	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The strictness signature for a data con wrapper wasn't including any dictionary arguments, which meant that bangs on the fields of a constructor with an existential context would be moved to the wrong fields. See T14290 for an example. Test Plan: * New test T14290 * validate Reviewers: simonpj, niteria, austin, bgamari, erikd Reviewed By: simonpj, bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #14290 Differential Revision: https://phabricator.haskell.org/D4040
*	Allow CSE'ing of work-wrapped bindings (#14186)	Joachim Breitner	2017-09-12	2	-5/+749
\| \| \| \| \| \| \| \|	the worker/wrapper creates an artificial INLINE pragma, which caused CSE to not do its work. We now recognize such artificial pragmas by using `NoUserInline` instead of `Inline` as the `InlineSpec`. Differential Revision: https://phabricator.haskell.org/D3939
*	Add testcase for #14186	Joachim Breitner	2017-09-12	1	-17/+4
\| \| \| \| \| \|	and move the generally useful helpers check_errmsg and grep_errmsg to testlib.py. Some documentation can be found on https://ghc.haskell.org/trac/ghc/wiki/Building/RunningTests/Adding
*	Add MonadIO Q - by requiring MonadIO => Quasi	Oleg Grenrus	2017-08-05	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is follow-up to https://ghc.haskell.org/trac/ghc/ticket/10773 Reviewers: austin, goldfire, bgamari, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3816
*	Revert "Make raiseIO# produce topRes"	David Feuer	2017-04-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit da4687f63ffe5a6162e3d7856aa53de048dd0f42. It's not entirely trivial to clean up the dead code this patch introduced. In particular, when we see ``` case raiseIO# m s of s' -> e ``` we want to know that `e` is dead. For scrutinees that are properly bottom (which we don't want to consider `raiseIO# m s` to be, this is handled by rewriting `bot` to `case bot of {}`. But if we do that for `raiseIO#`, we end up with ``` case raiseIO# m s of {} ``` which looks a lot like bottom and could confuse demand analysis. I think we need to wait with this change until we have a more complete story. Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3413
*	Fix test results for T13380	David Feuer	2017-03-09	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	They were `expect_broken` without any output. Add the actual output and remove the `expect_broken`. Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3306
*	testsuite: Add test for #11076	Ben Gamari	2017-03-06	2	-0/+11
\|
*	tests: remove extra_files.py (#12223)	Reid Barton	2017-02-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The script I used is included as testsuite/driver/kill_extra_files.py, though at this point it is for mostly historical interest. Some of the tests in libraries/hpc relied on extra_files.py, so this commit includes an update to that submodule. One test in libraries/process also relies on extra_files.py, but we cannot update that submodule so easily, so for now we special-case it in the test driver.
*	Do Worker/Wrapper for NOINLINE things	Eric Seidel	2017-02-05	2	-25/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Disabling worker/wrapper for NOINLINE things can cause unnecessary reboxing of values. Consider {-# NOINLINE f #-} f :: Int -> a f x = error (show x) g :: Bool -> Bool -> Int -> Int g True True p = f p g False True p = p + 1 g b False p = g b True p the strictness analysis will discover f and g are strict, but because f has no wrapper, the worker for g will rebox p. So we get $wg x y p# = let p = I# p# in -- Yikes! Reboxing! case x of False -> case y of False -> $wg False True p# True -> +# p# 1# True -> case y of False -> $wg True True p# True -> case f p of { } g x y p = case p of (I# p#) -> $wg x y p# Now, in this case the reboxing will float into the True branch, an so the allocation will only happen on the error path. But it won't float inwards if there are multiple branches that call (f p), so the reboxing will happen on every call of g. Disaster. Solution: do worker/wrapper even on NOINLINE things; but move the NOINLINE pragma to the worker. Test Plan: make test TEST="13143" Reviewers: simonpj, bgamari, dfeuer, austin Reviewed By: simonpj, bgamari Subscribers: dfeuer, thomie Differential Revision: https://phabricator.haskell.org/D3046