delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "Make the specialiser handle polymorphic specialisation"	Matthew Pickering	2022-04-30	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit ef0135934fe32da5b5bb730dbce74262e23e72e8. See ticket #21229 ------------------------- Metric Decrease: T15164 Metric Increase: T13056 -------------------------
*	testsuite: Mark FloatFnInverses as fixed	Ben Gamari	2022-04-06	1	-2/+1
\| \| \| \| \| \|	The new toolchain has fixed it. Closes #15670.
*	Change GHC.Prim to GHC.Exts in docs and tests	Krzysztof Gogolewski	2022-04-01	4	-4/+3
\| \| \| \| \|	Users are supposed to import GHC.Exts rather than GHC.Prim. Part of #18749.
*	Demand: Let `Boxed` win in `lubBoxity` (#21119)	Sebastian Graf	2022-03-16	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we let `Unboxed` win in `lubBoxity`, which is unsoundly optimistic in terms ob Boxity analysis. "Unsoundly" in the sense that we sometimes unbox parameters that we better shouldn't unbox. Examples are #18907 and T19871.absent. Until now, we thought that this hack pulled its weight becuase it worked around some shortcomings of the phase separation between Boxity analysis and CPR analysis. But it is a gross hack which caused regressions itself that needed all kinds of fixes and workarounds. See for example #20767. It became impossible to work with in !7599, so I want to remove it. For example, at the moment, `lubDmd B dmd` will not unbox `dmd`, but `lubDmd A dmd` will. Given that `B` is supposed to be the bottom element of the lattice, it's hardly justifiable to get a better demand when `lub`bing with `A`. The consequence of letting `Boxed` win in `lubBoxity` is that we would regress #2387, #16040 and parts of #5075 and T19871.sumIO, until Boxity and CPR are able to communicate better. Fortunately, that is not the case since I could tweak the other source of optimism in Boxity analysis that is described in `Note [Unboxed demand on function bodies returning small products]` so that we recursively assume unboxed demands on function bodies returning small products. See the updated Note. `Note [Boxity for bottoming functions]` describes why we need bottoming functions to have signatures that say that they deeply unbox their arguments. In so doing, I had to tweak `finaliseArgBoxities` so that it will never unbox recursive data constructors. This is in line with our handling of them in CPR. I updated `Note [Which types are unboxed?]` to reflect that. In turn we fix #21119, #20767, #18907, T19871.absent and get a much simpler implementation (at least to think about). We can also drop the very ad-hoc definition of `deferAfterPreciseException` and its Note in favor of the simple, intuitive definition we used to have. Metric Decrease: T16875 T18223 T18698a T18698b hard_hole_fits Metric Increase: LargeRecord MultiComponentModulesRecomp T15703 T8095 T9872d Out of all the regresions, only the one in T9872d doesn't vanish in a perf build, where the compiler is bootstrapped with -O2 and thus SpecConstr. Reason for regressions: * T9872d is due to `ty_co_subst` taking its `LiftingContext` boxed. That is because the context is passed to a function argument, for example in `liftCoSubstTyVarBndrUsing`. * In T15703, LargeRecord and T8095, we get a bit more allocations in `expand_syn` and `piResultTys`, because a `TCvSubst` isn't unboxed. In both cases that guards against reboxing in some code paths. * The same is true for MultiComponentModulesRecomp, where we get less unboxing in `GHC.Unit.Finder.$wfindInstalledHomeModule`. In a perf build, allocations actually improve by over 4%! Results on NoFib: -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- awards -0.4% +0.3% cacheprof -0.3% +2.4% fft -1.5% -5.1% fibheaps +1.2% +0.8% fluid -0.3% -0.1% ida +0.4% +0.9% k-nucleotide +0.4% -0.1% last-piece +10.5% +13.9% lift -4.4% +3.5% mandel2 -99.7% -99.8% mate -0.4% +3.6% parser -1.0% +0.1% puzzle -11.6% +6.5% reverse-complem -3.0% +2.0% scs -0.5% +0.1% sphere -0.4% -0.2% wave4main -8.2% -0.3% -------------------------------------------------------------------------------- Summary excludes mandel2 because of excessive bias Min -11.6% -5.1% Max +10.5% +13.9% Geometric Mean -0.2% +0.3% -------------------------------------------------------------------------------- Not bad for a bug fix. The regression in `last-piece` could become a win if SpecConstr would work on non-recursive functions. The regression in `fibheaps` is due to `Note [Reboxed crud for bottoming calls]`, e.g., #21128.
*	Improve out-of-order inferred type variables	sheaf	2022-03-02	3	-22/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't instantiate type variables for :type in `GHC.Tc.Gen.App.tcInstFun`, to avoid inconsistently instantianting `r1` but not `r2` in the type forall {r1} (a :: TYPE r1) {r2} (b :: TYPE r2). ... This fixes #21088. This patch also changes the primop pretty-printer to ensure that we put all the inferred type variables first. For example, the type of reallyUnsafePtrEquality# is now forall {l :: Levity} {k :: Levity} (a :: TYPE (BoxedRep l)) (b :: TYPE (BoxedRep k)). a -> b -> Int# This means we avoid running into issue #21088 entirely with the types of primops. Users can still write a type signature where the inferred type variables don't come first, however. This change to primops had a knock-on consequence, revealing that we were sometimes performing eta reduction on keepAlive#. This patch updates tryEtaReduce to avoid eta reducing functions with no binding, bringing it in line with tryEtaReducePrep, and thus fixing #21090.
*	Make Word64 use Word64# on every architecture	Sylvain Henry	2021-11-06	1	-13/+20
\|
*	Bignum: add missing rule	Sylvain Henry	2021-10-29	1	-17/+11
\| \| \| \|	Add missing "Natural -> Integer -> Word#" rule.
*	Add test for T15547 (#15547)	Sylvain Henry	2021-10-29	3	-0/+74
\| \| \| \|	Fix #15547
*	DmdAnal: Implement Boxity Analysis (#19871)	Sebastian Graf	2021-10-24	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes some abundant reboxing of `DynFlags` in `GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic of #19407) by introducing a Boxity analysis to GHC, done as part of demand analysis. This allows to accurately capture ad-hoc unboxing decisions previously made in worker/wrapper in demand analysis now, where the boxity info can propagate through demand signatures. See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in `Note [No lazy, Unboxed demand in demand signature]`, but `Note [Finalising boxity for demand signature]` is probably a better entry-point. To support the fix for #19407, I had to change (what was) `Note [Add demands for strict constructors]` a bit (now `Note [Unboxing evaluated arguments]`). In particular, we now take care of it in `finaliseBoxity` (which is only called from demand analaysis) instead of `wantToUnboxArg`. I also had to resurrect `Note [Product demands for function body]` and rename it to `Note [Unboxed demand on function bodies returning small products]` to avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again. See the updated Note for details. A nice side-effect is that the worker/wrapper transformation no longer needs to look at strictness info and other bits such as `InsideInlineableFun` flags (needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects boxity info from argument demands and interprets them with a severely simplified `wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys` which would be awkward to export. I spent some time figuring out the reason for why `T16197` failed prior to my amendments to `Note [Unboxing evaluated arguments]`. After having it figured out, I minimised it a bit and added `T16197b`, which simply compares computed strictness signatures and thus should be far simpler to eyeball. The 12% ghc/alloc regression in T11545 is because of the additional `Boxity` field in `Poly` and `Prod` that results in more allocation during `lubSubDmd` and `plusSubDmd`. I made sure in the ticky profiles that the number of calls to those functions stayed the same. We can bear such an increase here, as we recently improved it by -68% (in b760c1f). T18698* regress slightly because there is more unboxing of dictionaries happening and that causes Lint (mostly) to allocate more. Fixes #19871, #19407, #4267, #16859, #18907 and #13331. Metric Increase: T11545 T18698a T18698b Metric Decrease: T12425 T16577 T18223 T18282 T4267 T9961
*	Add test for #19641	Sylvain Henry	2021-10-22	3	-0/+44
\| \| \| \| \| \| \|	Now that Bignum predicates are inlined (!6696), we only need to add a test. Fix #19641
*	Bignum: allow naturalToWordClamp/Negate/Signum to inline (#20361)	Sylvain Henry	2021-10-07	1	-17/+16
\| \| \| \| \|	We don't need built-in rules now that bignum literals (e.g. 123 :: Natural) match with their constructors (e.g. NS 123##).
*	Constant folding for (.&.) maxBound (#20448)	Sylvain Henry	2021-10-05	3	-0/+64
\|
*	Constant folding for negate (#20347)	Sylvain Henry	2021-10-04	3	-0/+30
\| \| \| \|	Only for small integral types for now.
*	Rules for sized conversion primops (#19769)	Sylvain Henry	2021-09-30	5	-1/+176
\| \| \| \| \|	Metric Decrease: T12545
*	Add `-dsuppress-core-sizes` flag (#20342)	Sylvain Henry	2021-09-28	3	-0/+26
\| \| \| \| \|	This flag is used to remove the output of core stats per binding in Core dumps.
*	Constant-folding for timesInt2# (#20374)	Sylvain Henry	2021-09-23	3	-0/+61
\|
*	Constant folding for ctz/clz/popCnt (#20376)	Sylvain Henry	2021-09-17	3	-0/+97
\|
*	Canonicalize bignum literals	Sylvain Henry	2021-09-11	5	-8/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch Integer and Natural literals were desugared into "real" Core in Core prep. Now we desugar them directly into their final ConApp form in HsToCore. We only keep the double representation for BigNat# (literals larger than a machine Word/Int) which are still desugared in Core prep. Using the final form directly allows case-of-known-constructor to fire for bignum literals, fixing #20245. Slight increase (+2.3) in T4801 which is a pathological case with Integer literals. Metric Increase: T4801 T11545
*	Only dump Core stats when requested to do so (#20342)	Sylvain Henry	2021-09-08	3	-26/+0
\|
*	Bignum: refactor conversion rules	Sylvain Henry	2021-09-07	2	-7/+7
\| \| \| \| \| \| \| \|	* make "passthrough" rules non built-in: they don't need to * enhance note about efficient conversions between numeric types * make integerFromNatural a little more efficient * fix noinline pragma for naturalToWordClamp# (at least with non built-in rules, we get warnings in cases like this)
*	fromEnum Natural: Throw error for non-representable values	Peter Lebbing	2021-09-06	3	-0/+56
\| \| \| \| \| \| \| \|	Starting with commit fe770c21, an error was thrown only for the values 2^63 to 2^64-1 inclusive (on a 64-bit machine), but not for higher values. Now, errors are thrown for all non-representable values again. Fixes #20291
*	Remove ad-hoc fromIntegral rules	Sylvain Henry	2021-08-09	2	-0/+1009
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fromIntegral is defined as: {-# NOINLINE [1] fromIntegral #-} fromIntegral :: (Integral a, Num b) => a -> b fromIntegral = fromInteger . toInteger Before this patch, we had a lot of rewrite rules for fromIntegral, to avoid passing through Integer when there is a faster way, e.g.: "fromIntegral/Int->Word" fromIntegral = \(I# x#) -> W# (int2Word# x#) "fromIntegral/Word->Int" fromIntegral = \(W# x#) -> I# (word2Int# x#) "fromIntegral/Word->Word" fromIntegral = id :: Word -> Word Since we have added sized types and primops (Word8#, Int16#, etc.) and Natural, this approach didn't really scale as there is a combinatorial explosion of types. In addition, we really want these conversions to be optimized for all these types and in every case (not only when fromIntegral is explicitly used). This patch removes all those ad-hoc fromIntegral rules. Instead we rely on inlining and built-in constant-folding rules. There are not too many native conversions between Integer/Natural and fixed size types, so we can handle them all explicitly. Foreign.C.Types was using rules to ensure that fromIntegral rules "sees" through the newtype wrappers,e.g.: {-# RULES "fromIntegral/a->CSize" fromIntegral = \x -> CSize (fromIntegral x) "fromIntegral/CSize->a" fromIntegral = \(CSize x) -> fromIntegral x #-} But they aren't necessary because coercions due to newtype deriving are pushed out of the way. So this patch removes these rules (as fromIntegral is now inlined, they won't match anymore anyway). Summary: * INLINE `fromIntegral` * Add some missing constant-folding rules * Remove every fromIntegral ad-hoc rules (fix #19907) Fix #20062 (missing fromIntegral rules for sized primitives) Performance: - T12545 wiggles (tracked by #19414) Metric Decrease: T12545 T10359 Metric Increase: T12545
*	Fix #19931	John Ericson	2021-07-21	3	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issue was the renderer for x86 addressing modes assumes native size registers, but we were passing in a possibly-smaller index in conjunction with a native-sized base pointer. The easist thing to do is just extend the register first. I also changed the other NGC backends implementing jump tables accordingly. On one hand, I think PowerPC and Sparc don't have the small sub-registers anyways so there is less to worry about. On the other hand, to the extent that's true the zero extension can become a no-op. I should give credit where it's due: @hsyl20 really did all the work for me in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4717#note_355874, but I was daft and missed the "Oops" and so ended up spending a silly amount of time putting it all back together myself. The unregisterised backend change is a bit different, because here we are translating the actual case not a jump table, and the fix is to handle right-sized literals not addressing modes. But it makes sense to include here too because it's the same change in the subsequent commit that exposes both bugs.
*	Refactor SuggestExtension constructor in GhcHint	Alfredo Di Napoli	2021-07-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit refactors the SuggestExtension type constructor of the GhcHint to be more powerful and flexible. In particular, we can now embed extra user information (essentially "sugar") to help clarifying the suggestion. This makes the following possible: Suggested fix: Perhaps you intended to use GADTs or a similar language extension to enable syntax: data T where We can still give to IDEs and tools a `LangExt.Extension` they can use, but in the pretty-printed message we can tell the user a bit more on why such extension is needed. On top of that, we now have the ability to express conjuctions and disjunctons, for those cases where GHC suggests to enable "X or Y" and for the cases where we need "X and Y".
*	Avoid useless w/w split, take 2	Simon Peyton Jones	2021-06-05	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit: commit c6faa42bfb954445c09c5680afd4fb875ef03758 Author: Simon Peyton Jones <simonpj@microsoft.com> Date: Mon Mar 9 10:20:42 2020 +0000 Avoid useless w/w split This patch is just a tidy-up for the post-strictness-analysis worker wrapper split. Consider f x = x Strictnesss analysis does not lead to a w/w split, so the obvious thing is to leave it 100% alone. But actually, because the RHS is small, we ended up adding a StableUnfolding for it. There is some reason to do this if we choose /not/ do to w/w on the grounds that the function is small. See Note [Don't w/w inline small non-loop-breaker things] But there is no reason if we would not have done w/w anyway. This patch just moves the conditional to later. Easy. turns out to have a bug in it. Instead of /moving/ the conditional, I /duplicated/ it. Then in a subsequent unrelated tidy-up (087ac4eb) I removed the second (redundant) test! This patch does what I originally intended. There is also a small refactoring in GHC.Core.Unfold, to make the code clearer, but with no change in behaviour. It does, however, have a generally good effect on compile times, because we aren't dealing with so many silly stable unfoldings. Here are the non-zero changes: Metrics: compile_time/bytes allocated ------------------------------------- Baseline Test Metric value New value Change --------------------------------------------------------------------------- ManyAlternatives(normal) ghc/alloc 791969344.0 792665048.0 +0.1% ManyConstructors(normal) ghc/alloc 4351126824.0 4358303528.0 +0.2% PmSeriesG(normal) ghc/alloc 50362552.0 50482208.0 +0.2% PmSeriesS(normal) ghc/alloc 63733024.0 63619912.0 -0.2% T10421(normal) ghc/alloc 121224624.0 119695448.0 -1.3% GOOD T10421a(normal) ghc/alloc 85256392.0 83714224.0 -1.8% T10547(normal) ghc/alloc 29253072.0 29258256.0 +0.0% T10858(normal) ghc/alloc 189343152.0 187972328.0 -0.7% T11195(normal) ghc/alloc 281208248.0 279727584.0 -0.5% T11276(normal) ghc/alloc 141966952.0 142046224.0 +0.1% T11303b(normal) ghc/alloc 46228360.0 46259024.0 +0.1% T11545(normal) ghc/alloc 2663128768.0 2667412656.0 +0.2% T11822(normal) ghc/alloc 138686944.0 138760176.0 +0.1% T12227(normal) ghc/alloc 482836000.0 475421056.0 -1.5% GOOD T12234(optasm) ghc/alloc 60710520.0 60781808.0 +0.1% T12425(optasm) ghc/alloc 104089000.0 104022424.0 -0.1% T12545(normal) ghc/alloc 1711759416.0 1705711528.0 -0.4% T12707(normal) ghc/alloc 991541120.0 991921776.0 +0.0% T13035(normal) ghc/alloc 108199872.0 108370704.0 +0.2% T13056(optasm) ghc/alloc 414642544.0 412580384.0 -0.5% T13253(normal) ghc/alloc 361701272.0 355838624.0 -1.6% T13253-spj(normal) ghc/alloc 157710168.0 157397768.0 -0.2% T13379(normal) ghc/alloc 370984400.0 371345888.0 +0.1% T13701(normal) ghc/alloc 2439764144.0 2441351984.0 +0.1% T14052(ghci) ghc/alloc 2154090896.0 2156671400.0 +0.1% T15164(normal) ghc/alloc 1478517688.0 1440317696.0 -2.6% GOOD T15630(normal) ghc/alloc 178053912.0 172489808.0 -3.1% T16577(normal) ghc/alloc 7859948896.0 7854524080.0 -0.1% T17516(normal) ghc/alloc 1271520128.0 1202096488.0 -5.5% GOOD T17836(normal) ghc/alloc 1123320632.0 1123922480.0 +0.1% T17836b(normal) ghc/alloc 54526280.0 54576776.0 +0.1% T17977b(normal) ghc/alloc 42706752.0 42730544.0 +0.1% T18140(normal) ghc/alloc 108834568.0 108693816.0 -0.1% T18223(normal) ghc/alloc 5539629264.0 5579500872.0 +0.7% T18304(normal) ghc/alloc 97589720.0 97196944.0 -0.4% T18478(normal) ghc/alloc 770755472.0 771232888.0 +0.1% T18698a(normal) ghc/alloc 408691160.0 374364992.0 -8.4% GOOD T18698b(normal) ghc/alloc 492419768.0 458809408.0 -6.8% GOOD T18923(normal) ghc/alloc 72177032.0 71368824.0 -1.1% T1969(normal) ghc/alloc 803523496.0 804655112.0 +0.1% T3064(normal) ghc/alloc 198411784.0 198608512.0 +0.1% T4801(normal) ghc/alloc 312416688.0 312874976.0 +0.1% T5321Fun(normal) ghc/alloc 325230680.0 325474448.0 +0.1% T5631(normal) ghc/alloc 592064448.0 593518968.0 +0.2% T5837(normal) ghc/alloc 37691496.0 37710904.0 +0.1% T783(normal) ghc/alloc 404629536.0 405064432.0 +0.1% T9020(optasm) ghc/alloc 266004608.0 266375592.0 +0.1% T9198(normal) ghc/alloc 49221336.0 49268648.0 +0.1% T9233(normal) ghc/alloc 913464984.0 742680256.0 -18.7% GOOD T9675(optasm) ghc/alloc 552296608.0 466322000.0 -15.6% GOOD T9872a(normal) ghc/alloc 1789910616.0 1793924472.0 +0.2% T9872b(normal) ghc/alloc 2315141376.0 2310338056.0 -0.2% T9872c(normal) ghc/alloc 1840422424.0 1841567224.0 +0.1% T9872d(normal) ghc/alloc 556713248.0 556838432.0 +0.0% T9961(normal) ghc/alloc 383809160.0 384601600.0 +0.2% WWRec(normal) ghc/alloc 773751272.0 753949608.0 -2.6% GOOD Residency goes down too: Metrics: compile_time/max_bytes_used ------------------------------------ Baseline Test Metric value New value Change ----------------------------------------------------------- T10370(optasm) ghc/max 42058448.0 39481672.0 -6.1% T11545(normal) ghc/max 43641392.0 43634752.0 -0.0% T15304(normal) ghc/max 29895824.0 29439032.0 -1.5% T15630(normal) ghc/max 8822568.0 8772328.0 -0.6% T18698a(normal) ghc/max 13882536.0 13787112.0 -0.7% T18698b(normal) ghc/max 14714112.0 13836408.0 -6.0% T1969(normal) ghc/max 24724128.0 24733496.0 +0.0% T3064(normal) ghc/max 14041152.0 14034768.0 -0.0% T3294(normal) ghc/max 32769248.0 32760312.0 -0.0% T9630(normal) ghc/max 41605120.0 41572184.0 -0.1% T9675(optasm) ghc/max 18652296.0 17253480.0 -7.5% Metric Decrease: T10421 T12227 T15164 T17516 T18698a T18698b T9233 T9675 WWRec Metric Increase: T12545
*	Port HsToCore messages to new infrastructure	Alfredo Di Napoli	2021-06-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	This commit converts a bunch of HsToCore (Ds) messages to use the new GHC's diagnostic message infrastructure. In particular the DsMessage type has been expanded with a lot of type constructors, each encapsulating a particular error and warning emitted during desugaring. Due to the fact that levity polymorphism checking can happen both at the Ds and at the TcRn level, a new `TcLevityCheckDsMessage` constructor has been added to the `TcRnMessage` type.
*	Bignum: match on DataCon workers in rules (#19892)	Sylvain Henry	2021-05-29	4	-4/+25
\| \| \| \| \| \| \| \| \|	We need to match on DataCon workers for the rules to be triggered. T13701 ghc/alloc decreases by ~2.5% on some archs Metric Decrease: T13701
*	Nested CPR light (#19398)	Sebastian Graf	2021-03-20	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While fixing #19232, it became increasingly clear that the vestigial hack described in `Note [Optimistic field binder CPR]` is complicated and causes reboxing. Rather than make the hack worse, this patch gets rid of it completely in favor of giving deeply unboxed parameters the Nested CPR property. Example: ```hs f :: (Int, Int) -> Int f p = case p of (x, y) \| x == y = x \| otherwise = y ``` Based on `p`'s `idDemandInfo` `1P(1P(L),1P(L))`, we can see that both fields of `p` will be available unboxed. As a result, we give `p` the nested CPR property `1(1,1)`. When analysing the `case`, the field CPRs are transferred to the binders `x` and `y`, respectively, so that we ultimately give `f` the CPR property. I took the liberty to do a bit of refactoring: - I renamed `CprResult` ("Constructed product result result") to plain `Cpr`. - I Introduced `FlatConCpr` in addition to (now nested) `ConCpr` and and according pattern synonym that rewrites flat `ConCpr` to `FlatConCpr`s, purely for compiler perf reasons. - Similarly for performance reasons, we now store binders with a Top signature in a separate `IntSet`, see `Note [Efficient Top sigs in SigEnv]`. - I moved a bit of stuff around in `GHC.Core.Opt.WorkWrap.Utils` and introduced `UnboxingDecision` to replace the `Maybe DataConPatContext` type we used to return from `wantToUnbox`. - Since the `Outputable Cpr` instance changed anyway, I removed the leading `m` which we used to emit for `ConCpr`. It's just noise, especially now that we may output nested CPRs. Fixes #19398.
*	Add a test for fromInteger :: Integer -> Float/Double (#15926, #17231, #17782)	ARATA Mizuki	2021-03-17	3	-0/+85
\|
*	DmdAnal: Better syntax for demand signatures (#19016)	Sebastian Graf	2021-03-03	2	-5/+5
\| \| \| \| \| \| \| \| \|	The update of the Outputable instance resulted in a slew of documentation changes within Notes that used the old syntax. The most important doc changes are to `Note [Demand notation]` and the user's guide. Fixes #19016.
*	Fix array and cleanup conversion primops (#19026)	Sylvain Henry	2021-03-03	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The first change makes the array ones use the proper fixed-size types, which also means that just like before, they can be used without explicit conversions with the boxed sized types. (Before, it was Int# / Word# on both sides, now it is fixed sized on both sides). For the second change, don't use "extend" or "narrow" in some of the user-facing primops names for conversions. - Names like `narrowInt32#` are misleading when `Int` is 32-bits. - Names like `extendInt64#` are flat-out wrong when `Int is 32-bits. - `narrow{Int,Word}<N>#` however map a type to itself, and so don't suffer from this problem. They are left as-is. These changes are batched together because Alex happend to use the array ops. We can only use released versions of Alex at this time, sadly, and I don't want to have to have a release thatwon't work for the final GHC 9.2. So by combining these we get all the changes for Alex done at once. Bump hackage state in a few places, and also make that workflow slightly easier for the future. Bump minimum Alex version Bump Cabal, array, bytestring, containers, text, and binary submodules
*	Fix terrible occurrence-analysis bug	Simon Peyton Jones	2021-03-01	1	-10/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ticket #19360 showed up a terrible bug in the occurrence analyser, in a situation like this Rec { f = g ; g = ..f... {-# RULE g .. = ...f... #-} } Then f was postInlineUnconditionally, but not in the RULE (which is simplified first), so we had a RULE mentioning a variable that was not in scope. This led me to review (again) the subtle loop-breaker stuff in the occurrence analyser. The actual changes are few, and are largely simplifications. I did a /lot/ of comment re-organising though. There was an unexpected amount of fallout. * Validation failed when compiling the stage2 compiler with profiling on. That turned to tickle a second latent bug in the same OccAnal code (at least I think it was always there), which led me to simplify still further; see Note [inl_fvs] in GHC.Core.Opt.OccurAnal. * But that in turn let me to some strange behaviour in CSE when ticks are in the picture, which I duly fixed. See Note [Dealing with ticks] in GHC.Core.Opt.CSE. * Then I got an ASSERT failure in CoreToStg, which again seems to be a latent bug. See Note [Ticks in applications] in GHC.CoreToStg * I also made one unforced change: I now simplify the RHS of a RULE in the same way as the RHS of a stable unfolding. This can allow a trivial binding to disappear sooner than otherwise, and I don't think it has any downsides. The change is in GHC.Core.Opt.Simplify.simplRules.
*	Bignum: fix for Integer/Natural Ord instances	Sylvain Henry	2021-01-17	1	-10/+6
\| \| \| \| \| \| \| \| \|	* allow `integerCompare` to inline into `integerLe#`, etc. * use `naturalSubThrow` to implement Natural's `(-)` * use `naturalNegate` to implement Natural's `negate` * implement and use `integerToNaturalThrow` to implement Natural's `fromInteger` Thanks to @christiaanb for reporting these
*	base: add Numeric.{readBin, showBin} (fix #19036)	Artem Pelenitsyn	2021-01-02	1	-2/+0
\|
*	Display FFI labels (fix #18539)	Sylvain Henry	2020-12-11	1	-1/+1
\|
*	[Sized Cmm] properly retain sizes.	Moritz Angermann	2020-11-26	2	-5/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces all Word<N> = W<N># Word# and Int<N> = I<N># Int# with Word<N> = W<N># Word<N># and Int<N> = I<N># Int<N>#, thus providing us with properly sized primitives in the codegenerator instead of pretending they are all full machine words. This came up when implementing darwinpcs for arm64. The darwinpcs reqires us to pack function argugments in excess of registers on the stack. While most procedure call standards (pcs) assume arguments are just passed in 8 byte slots; and thus the caller does not know the exact signature to make the call, darwinpcs requires us to adhere to the prototype, and thus have the correct sizes. If we specify CInt in the FFI call, it should correspond to the C int, and not just be Word sized, when it's only half the size. This does change the expected output of T16402 but the new result is no less correct as it eliminates the narrowing (instead of the `and` as was previously done). Bumps the array, bytestring, text, and binary submodules. Co-Authored-By: Ben Gamari <ben@well-typed.com> Metric Increase: T13701 T14697
*	Demand: Interleave usage and strictness demands (#18903)	Sebastian Graf	2020-11-20	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As outlined in #18903, interleaving usage and strictness demands not only means a more compact demand representation, but also allows us to express demands that we weren't easily able to express before. Call demands are relative in the sense that a call demand `Cn(cd)` on `g` says "`g` is called `n` times. Whenever `g` is called, the result is used according to `cd`". Example from #18903: ```hs h :: Int -> Int h m = let g :: Int -> (Int,Int) g 1 = (m, 0) g n = (2 * n, 2 `div` n) {-# NOINLINE g #-} in case m of 1 -> 0 2 -> snd (g m) _ -> uncurry (+) (g m) ``` Without the interleaved representation, we would just get `L` for the strictness demand on `g`. Now we are able to express that whenever `g` is called, its second component is used strictly in denoting `g` by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the division, for example. Fixes #18903. While fixing regressions, I also discovered and fixed #18957. Metric Decrease: T13253-spj
*	Bignum: fix BigNat subtraction (#18604)	Sylvain Henry	2020-08-26	3	-0/+12
\| \| \| \| \| \|	There was a confusion between the boolean expected by withNewWordArrayTrimedMaybe and the boolean returned by subtracting functions.
*	Bignum: fix powMod for gmp backend (#18515)	Sylvain Henry	2020-07-30	3	-0/+14
\| \| \| \| \|	Also reenable integerPowMod test which had never been reenabled by mistake.
*	Fix bug in Natural multiplication (fix #18509)	Sylvain Henry	2020-07-29	3	-0/+9
\| \| \| \| \| \|	A bug was lingering in Natural multiplication (inverting two limbs) despite QuickCheck tests used during the development leading to wrong results (independently of the selected backend).
*	Bignum: add support for negative shifts (fix #18499)	Sylvain Henry	2020-07-28	3	-0/+44
\| \| \| \| \| \| \|	shiftR/shiftL support negative arguments despite Haskell 2010 report saying otherwise. We explicitly test for negative values which is bad (it gets in the way of constant folding, etc.). Anyway, for consistency we fix Bits instancesof Integer/Natural.
*	This patch addresses the exponential blow-up in the simplifier.	Simon Peyton Jones	2020-07-28	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specifically: #13253 exponential inlining #10421 ditto #18140 strict constructors #18282 another nested-function call case This patch makes one really significant changes: change the way that mkDupableCont handles StrictArg. The details are explained in GHC.Core.Opt.Simplify Note [Duplicating StrictArg]. Specific changes * In mkDupableCont, when making auxiliary bindings for the other arguments of a call, add extra plumbing so that we don't forget the demand on them. Otherwise we haev to wait for another round of strictness analysis. But actually all the info is to hand. This change affects: - Make the strictness list in ArgInfo be [Demand] instead of [Bool], and rename it to ai_dmds. - Add as_dmd to ValArg - Simplify.makeTrivial takes a Demand - mkDupableContWithDmds takes a [Demand] There are a number of other small changes 1. For Ids that are used at most once in each branch of a case, make the occurrence analyser record the total number of syntactic occurrences. Previously we recorded just OneBranch or MultipleBranches. I thought this was going to be useful, but I ended up barely using it; see Note [Note [Suppress exponential blowup] in GHC.Core.Opt.Simplify.Utils Actual changes: * See the occ_n_br field of OneOcc. * postInlineUnconditionally 2. I found a small perf buglet in SetLevels; see the new function GHC.Core.Opt.SetLevels.hasFreeJoin 3. Remove the sc_cci field of StrictArg. I found I could get its information from the sc_fun field instead. Less to get wrong! 4. In ArgInfo, arrange that ai_dmds and ai_discs have a simpler invariant: they line up with the value arguments beyond ai_args This allowed a bit of nice refactoring; see isStrictArgInfo, lazyArgcontext, strictArgContext There is virtually no difference in nofib. (The runtime numbers are bogus -- I tried a few manually.) Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- fft +0.0% -2.0% -48.3% -49.4% 0.0% multiplier +0.0% -2.2% -50.3% -50.9% 0.0% -------------------------------------------------------------------------------- Min -0.4% -2.2% -59.2% -60.4% 0.0% Max +0.0% +0.1% +3.3% +4.9% 0.0% Geometric Mean +0.0% -0.0% -33.2% -34.3% -0.0% Test T18282 is an existing example of these deeply-nested strict calls. We get a big decrease in compile time (-85%) because so much less inlining takes place. Metric Decrease: T18282
*	Reduce result discount in conSize	Simon Peyton Jones	2020-07-13	3	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ticket #18282 showed that the result discount given by conSize was massively too large. This patch reduces that discount to a constant 10, which just balances the cost of the constructor application itself. Note [Constructor size and result discount] elaborates, as does the ticket #18282. Reducing result discount reduces inlining, which affects perf. I found that I could increase the unfoldingUseThrehold from 80 to 90 in compensation; in combination with the result discount change I get these overall nofib numbers: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- boyer -0.2% +5.4% -3.2% -3.4% 0.0% cichelli -0.1% +5.9% -11.2% -11.7% 0.0% compress2 -0.2% +9.6% -6.0% -6.8% 0.0% cryptarithm2 -0.1% -3.9% -6.0% -5.7% 0.0% gamteb -0.2% +2.6% -13.8% -14.4% 0.0% genfft -0.1% -1.6% -29.5% -29.9% 0.0% gg -0.0% -2.2% -17.2% -17.8% -20.0% life -0.1% -2.2% -62.3% -63.4% 0.0% mate +0.0% +1.4% -5.1% -5.1% -14.3% parser -0.2% -2.1% +7.4% +6.7% 0.0% primetest -0.2% -12.8% -14.3% -14.2% 0.0% puzzle -0.2% +2.1% -10.0% -10.4% 0.0% rsa -0.2% -11.7% -3.7% -3.8% 0.0% simple -0.2% +2.8% -36.7% -38.3% -2.2% wheel-sieve2 -0.1% -19.2% -48.8% -49.2% -42.9% -------------------------------------------------------------------------------- Min -0.4% -19.2% -62.3% -63.4% -42.9% Max +0.3% +9.6% +7.4% +11.0% +16.7% Geometric Mean -0.1% -0.3% -17.6% -18.0% -0.7% I'm ok with these numbers, remembering that this change removes an exponential increase in code size in some in-the-wild cases. I investigated compress2. The difference is entirely caused by this function no longer inlining WriteRoutines.$woutputCodes = \ (w :: [CodeEvent]) -> let result_s1Sr = case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of (# ww1, ww2 #) -> (ww1, ww2) in (# case result_s1Sr of (x, _) -> map @Int @Char WriteRoutines.outputCodes1 x , case result_s1Sr of { (_, y) -> y } #) It was right on the cusp before, driven by the excessive result discount. Too bad! Happily, the compiler/perf tests show a number of improvements: T12227 compiler bytes-alloc -6.6% T12545 compiler bytes-alloc -4.7% T13056 compiler bytes-alloc -3.3% T15263 runtime bytes-alloc -13.1% T17499 runtime bytes-alloc -14.3% T3294 compiler bytes-alloc -1.1% T5030 compiler bytes-alloc -11.7% T9872a compiler bytes-alloc -2.0% T9872b compiler bytes-alloc -1.2% T9872c compiler bytes-alloc -1.5% Metric Decrease: T12227 T12545 T13056 T15263 T17499 T3294 T5030 T9872a T9872b T9872c
*	BigNum: rename BigNat types	Sylvain Henry	2020-07-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch BigNat names were confusing because we had: * GHC.Num.BigNat.BigNat: unlifted type used everywhere else * GHC.Num.BigNat.BigNatW: lifted type only used to share static constants * GHC.Natural.BigNat: lifted type only used for backward compatibility After this patch we have: * GHC.Num.BigNat.BigNat#: unlifted type * GHC.Num.BigNat.BigNat: lifted type (reexported from GHC.Natural) Thanks to @RyanGlScott for spotting this.
*	Fix ghc-bignum exceptions	Sylvain Henry	2020-06-27	3	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We must ensure that exceptions are not simplified. Previously we used: case raiseDivZero of _ -> 0## -- dummyValue But it was wrong because the evaluation of `raiseDivZero` was removed and the dummy value was directly returned. See new Note [ghc-bignum exceptions]. I've also removed the exception triggering primops which were fragile. We don't need them to be primops, we can have them exported by ghc-prim. I've also added a test for #18359 which triggered this patch.
*	Update testsuite	Sylvain Henry	2020-06-17	4	-18/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* support detection of slow ghc-bignum backend (to replace the detection of integer-simple use). There are still some test cases that the native backend doesn't handle efficiently enough. * remove tests for GMP only functions that have been removed from ghc-bignum * fix test results showing dependent packages (e.g. integer-gmp) or showing suggested instances * fix test using Integer/Natural API or showing internal names
*	CprAnal: Don't attach CPR sigs to expandable bindings (#18154)	Sebastian Graf	2020-05-13	3	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, look through expandable unfoldings in `cprTransform`. See the new Note [CPR for expandable unfoldings]: ``` Long static data structures (whether top-level or not) like xs = x1 : xs1 xs1 = x2 : xs2 xs2 = x3 : xs3 should not get CPR signatures, because they * Never get WW'd, so their CPR signature should be irrelevant after analysis (in fact the signature might even be harmful for that reason) * Would need to be inlined/expanded to see their constructed product * Recording CPR on them blows up interface file sizes and is redundant with their unfolding. In case of Nested CPR, this blow-up can be quadratic! But we can't just stop giving DataCon application bindings the CPR property, for example fac 0 = 1 fac n = n * fac (n-1) fac certainly has the CPR property and should be WW'd! But FloatOut will transform the first clause to lvl = 1 fac 0 = lvl If lvl doesn't have the CPR property, fac won't either. But lvl doesn't have a CPR signature to extrapolate into a CPR transformer ('cprTransform'). So instead we keep on cprAnal'ing through expandable unfoldings for these arity 0 bindings via 'cprExpandUnfolding_maybe'. In practice, GHC generates a lot of (nested) TyCon and KindRep bindings, one for each data declaration. It's wasteful to attach CPR signatures to each of them (and intractable in case of Nested CPR). ``` Fixes #18154.
*	Separate CPR analysis from the Demand analyserwip/sep-cpr	Sebastian Graf	2020-02-12	3	-13/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reasons for that can be found in the wiki: https://gitlab.haskell.org/ghc/ghc/wikis/nested-cpr/split-off-cpr We now run CPR after demand analysis (except for after the final demand analysis run just before code gen). CPR got its own dump flags (`-ddump-cpr-anal`, `-ddump-cpr-signatures`), but not its own flag to activate/deactivate. It will run with `-fstrictness`/`-fworker-wrapper`. As explained on the wiki page, this step is necessary for a sane Nested CPR analysis. And it has quite positive impact on compiler performance: Metric Decrease: T9233 T9675 T9961 T15263
*	testsuite: Fix -Wcompat-unqualified-imports issues	Ben Gamari	2020-02-08	2	-2/+2
\|
*	Do CafInfo/SRT analysis in Cmm	Ömer Sinan Ağacan	2020-01-31	4	-25/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes all CafInfo predictions and various hacks to preserve predicted CafInfos from the compiler and assigns final CafInfos to interface Ids after code generation. SRT analysis is extended to support static data, and Cmm generator is modified to allow generating static_link fields after SRT analysis. This also fixes `-fcatch-bottoms`, which introduces error calls in case expressions in CorePrep, which runs after CoreTidy (which is where we decide on CafInfos) and turns previously non-CAFFY things into CAFFY. Fixes #17648 Fixes #9718 Evaluation ========== NoFib ----- Boot with: `make boot mode=fast` Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1` -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.0% 0.0% -0.0% -0.0% -0.0% CSD -0.0% 0.0% -0.0% -0.0% -0.0% FS -0.0% 0.0% -0.0% -0.0% -0.0% S -0.0% 0.0% -0.0% -0.0% -0.0% VS -0.0% 0.0% -0.0% -0.0% -0.0% VSD -0.0% 0.0% -0.0% -0.0% -0.5% VSM -0.0% 0.0% -0.0% -0.0% -0.0% anna -0.1% 0.0% -0.0% -0.0% -0.0% ansi -0.0% 0.0% -0.0% -0.0% -0.0% atom -0.0% 0.0% -0.0% -0.0% -0.0% awards -0.0% 0.0% -0.0% -0.0% -0.0% banner -0.0% 0.0% -0.0% -0.0% -0.0% bernouilli -0.0% 0.0% -0.0% -0.0% -0.0% binary-trees -0.0% 0.0% -0.0% -0.0% -0.0% boyer -0.0% 0.0% -0.0% -0.0% -0.0% boyer2 -0.0% 0.0% -0.0% -0.0% -0.0% bspt -0.0% 0.0% -0.0% -0.0% -0.0% cacheprof -0.0% 0.0% -0.0% -0.0% -0.0% calendar -0.0% 0.0% -0.0% -0.0% -0.0% cichelli -0.0% 0.0% -0.0% -0.0% -0.0% circsim -0.0% 0.0% -0.0% -0.0% -0.0% clausify -0.0% 0.0% -0.0% -0.0% -0.0% comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0% compress -0.0% 0.0% -0.0% -0.0% -0.0% compress2 -0.0% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0% cse -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0% dom-lt -0.0% 0.0% -0.0% -0.0% -0.0% eliza -0.0% 0.0% -0.0% -0.0% -0.0% event -0.0% 0.0% -0.0% -0.0% -0.0% exact-reals -0.0% 0.0% -0.0% -0.0% -0.0% exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0% expert -0.0% 0.0% -0.0% -0.0% -0.0% fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0% fasta -0.0% 0.0% -0.0% -0.0% -0.0% fem -0.0% 0.0% -0.0% -0.0% -0.0% fft -0.0% 0.0% -0.0% -0.0% -0.0% fft2 -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% fish -0.0% 0.0% -0.0% -0.0% -0.0% fluid -0.1% 0.0% -0.0% -0.0% -0.0% fulsom -0.0% 0.0% -0.0% -0.0% -0.0% gamteb -0.0% 0.0% -0.0% -0.0% -0.0% gcd -0.0% 0.0% -0.0% -0.0% -0.0% gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0% genfft -0.0% 0.0% -0.0% -0.0% -0.0% gg -0.0% 0.0% -0.0% -0.0% -0.0% grep -0.0% 0.0% -0.0% -0.0% -0.0% hidden -0.0% 0.0% -0.0% -0.0% -0.0% hpg -0.1% 0.0% -0.0% -0.0% -0.0% ida -0.0% 0.0% -0.0% -0.0% -0.0% infer -0.0% 0.0% -0.0% -0.0% -0.0% integer -0.0% 0.0% -0.0% -0.0% -0.0% integrate -0.0% 0.0% -0.0% -0.0% -0.0% k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0% kahan -0.0% 0.0% -0.0% -0.0% -0.0% knights -0.0% 0.0% -0.0% -0.0% -0.0% lambda -0.0% 0.0% -0.0% -0.0% -0.0% last-piece -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% life -0.0% 0.0% -0.0% -0.0% -0.0% lift -0.0% 0.0% -0.0% -0.0% -0.0% linear -0.1% 0.0% -0.0% -0.0% -0.0% listcompr -0.0% 0.0% -0.0% -0.0% -0.0% listcopy -0.0% 0.0% -0.0% -0.0% -0.0% maillist -0.0% 0.0% -0.0% -0.0% -0.0% mandel -0.0% 0.0% -0.0% -0.0% -0.0% mandel2 -0.0% 0.0% -0.0% -0.0% -0.0% mate -0.0% 0.0% -0.0% -0.0% -0.0% minimax -0.0% 0.0% -0.0% -0.0% -0.0% mkhprog -0.0% 0.0% -0.0% -0.0% -0.0% multiplier -0.0% 0.0% -0.0% -0.0% -0.0% n-body -0.0% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0% para -0.0% 0.0% -0.0% -0.0% -0.0% paraffins -0.0% 0.0% -0.0% -0.0% -0.0% parser -0.1% 0.0% -0.0% -0.0% -0.0% parstof -0.1% 0.0% -0.0% -0.0% -0.0% pic -0.0% 0.0% -0.0% -0.0% -0.0% pidigits -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% pretty -0.0% 0.0% -0.3% -0.4% -0.4% primes -0.0% 0.0% -0.0% -0.0% -0.0% primetest -0.0% 0.0% -0.0% -0.0% -0.0% prolog -0.0% 0.0% -0.0% -0.0% -0.0% puzzle -0.0% 0.0% -0.0% -0.0% -0.0% queens -0.0% 0.0% -0.0% -0.0% -0.0% reptile -0.0% 0.0% -0.0% -0.0% -0.0% reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0% rewrite -0.0% 0.0% -0.0% -0.0% -0.0% rfib -0.0% 0.0% -0.0% -0.0% -0.0% rsa -0.0% 0.0% -0.0% -0.0% -0.0% scc -0.0% 0.0% -0.3% -0.5% -0.4% sched -0.0% 0.0% -0.0% -0.0% -0.0% scs -0.0% 0.0% -0.0% -0.0% -0.0% simple -0.1% 0.0% -0.0% -0.0% -0.0% solid -0.0% 0.0% -0.0% -0.0% -0.0% sorting -0.0% 0.0% -0.0% -0.0% -0.0% spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0% sphere -0.0% 0.0% -0.0% -0.0% -0.0% symalg -0.0% 0.0% -0.0% -0.0% -0.0% tak -0.0% 0.0% -0.0% -0.0% -0.0% transform -0.0% 0.0% -0.0% -0.0% -0.0% treejoin -0.0% 0.0% -0.0% -0.0% -0.0% typecheck -0.0% 0.0% -0.0% -0.0% -0.0% veritas -0.0% 0.0% -0.0% -0.0% -0.0% wang -0.0% 0.0% -0.0% -0.0% -0.0% wave4main -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0% x2n1 -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.3% -0.5% -0.5% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.1% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% gc_bench -0.0% 0.0% -0.0% -0.0% -0.0% hash -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% spellcheck -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.0% -0.0% -0.0% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0% Manual inspection of programs in testsuite/tests/programs --------------------------------------------------------- I built these programs with a bunch of dump flags and `-O` and compared STG, Cmm, and Asm dumps and file sizes. (Below the numbers in parenthesis show number of modules in the program) These programs have identical compiler (same .hi and .o sizes, STG, and Cmm and Asm dumps): - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3), andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2), jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4), jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1), bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1), strict_anns (1), thurston-module-arith (2), okeefe_neural (1), joao-circular (6), 10queens (1) Programs with different compiler outputs: - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level `[Int]` closures as CAFFY for no reason. With this patch we no longer make them CAFFY and generate less SRT entries. For some reason Main.o is slightly larger with this patch (1.3%) and the executable sizes are the same. (I'd expect both to be smaller) - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the executable sizes are the same. - galois_raytrace (13): Differences are in the Parse module. There are a lot, but some of the changes are caused by the fact that for some reason (I think a bug) GHC HEAD marks the dictionary for `Functor Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the same. - north_array: We now generate less SRT entries because some of array primops used in this program like `NewArrayOp` get eliminated during Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24% larger (9224 bytes from 9000 bytes), executable sizes are the same. - seward-space-leak: Difference in this program is better shown by this smaller example: module Lib where data CDS = Case [CDS] [(Int, CDS)] \| Call CDS CDS instance Eq CDS where Case sels1 rets1 == Case sels2 rets2 = sels1 == sels2 && rets1 == rets2 Call a1 b1 == Call a2 b2 = a1 == a2 && b1 == b2 _ == _ = False In this program GHC HEAD builds a new SRT for the recursive group of `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==` in its SRT field, and `==` uses the SRT object as its SRT. With this patch we use the closure for `/=` as the SRT and add `==` there. Then `/=` gets an empty SRT field and `==` points to `/=` in its SRT field. This change looks fine to me. Main.o gets 0.07% larger, executable sizes are identical. head.hackage ------------ head.hackage's CI script builds 428 packages from Hackage using this patch with no failures. Compiler performance -------------------- The compiler perf tests report that the compiler allocates slightly more (worst case observed so far is 4%). However most programs in the test suite are small, single file programs. To benchmark compiler performance on something more realistic I build Cabal (the library, 236 modules) with different optimisation levels. For the "max residency" row I run GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows are generated with just `-s`. (This is because `-i0` causes running GC much more frequently and as a result "bytes copied" gets inflated by more than 25x in some cases) * -O0 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| -------------- \| -------------- \| ------ \| \| Bytes allocated \| 54,413,350,872 \| 54,701,099,464 \| +0.52% \| \| Bytes copied \| 4,926,037,184 \| 4,990,638,760 \| +1.31% \| \| Max residency \| 421,225,624 \| 424,324,264 \| +0.73% \| * -O1 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| --------------- \| --------------- \| ------ \| \| Bytes allocated \| 245,849,209,992 \| 246,562,088,672 \| +0.28% \| \| Bytes copied \| 26,943,452,560 \| 27,089,972,296 \| +0.54% \| \| Max residency \| 982,643,440 \| 991,663,432 \| +0.91% \| * -O2 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| --------------- \| --------------- \| ------ \| \| Bytes allocated \| 291,044,511,408 \| 291,863,910,912 \| +0.28% \| \| Bytes copied \| 37,044,237,616 \| 36,121,690,472 \| -2.49% \| \| Max residency \| 1,071,600,328 \| 1,086,396,256 \| +1.38% \| Extra compiler allocations -------------------------- Runtime allocations of programs are as reported above (NoFib section). The compiler now allocates more than before. Main source of allocation in this patch compared to base commit is the new SRT algorithm (GHC.Cmm.Info.Build). Below is some of the extra work we do with this patch, numbers generated by profiled stage 2 compiler when building a pathological case (the test 'ManyConstructors') with '-O2': - We now sort the final STG for a module, which means traversing the entire program, generating free variable set for each top-level binding, doing SCC analysis, and re-ordering the program. In ManyConstructors this step allocates 97,889,952 bytes. - We now do SRT analysis on static data, which in a program like ManyConstructors causes analysing 10,000 bindings that we would previously just skip. This step allocates 70,898,352 bytes. - We now maintain an SRT map for the entire module as we compile Cmm groups: data ModuleSRTInfo = ModuleSRTInfo { ... , moduleSRTMap :: SRTMap } (SRTMap is just a strict Map from the 'containers' library) This map gets an entry for most bindings in a module (exceptions are THUNKs and CAFFY static functions). For ManyConstructors this map gets 50015 entries. - Once we're done with code generation we generate a NameSet from SRTMap for the non-CAFFY names in the current module. This set gets the same number of entries as the SRTMap. - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using the NameSet generated in the previous step. This usually does the least amount of allocation among the work listed here. Only place with this patch where we do less work in the CAF analysis in the tidying pass (CoreTidy). However that doesn't save us much, as the pass still needs to traverse the whole program and update IdInfos for other reasons. Only thing we don't here do is the `hasCafRefs` pass over the RHS of bindings, which is a stateless pass that returns a boolean value, so it doesn't allocate much. (Metric changes blow are all increased allocations) Metric changes -------------- Metric Increase: ManyAlternatives ManyConstructors T13035 T14683 T1969 T9961