| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Removed references to driver from GHC.Core.LateCC, GHC.Core.Simplify
namespace and GHC.Core.Opt.Stats.
Also removed services from configuration records.
* Renamed GHC.Core.Opt.Simplify to GHC.Core.Opt.Simplify.Iteration.
* Inlined `simplifyPgm` and renamed `simplifyPgmIO` to `simplifyPgm`
and moved the Simplify driver to GHC.Core.Opt.Simplify.
* Moved `SimplMode` and `FloatEnable` to GHC.Core.Opt.Simplify.Env.
* Added a configuration record `TopEnvConfig` for the `SimplTopEnv` environment
in GHC.Core.Opt.Simplify.Monad.
* Added `SimplifyOpts` and `SimplifyExprOpts`. Provide initialization functions
for those in a new module GHC.Driver.Config.Core.Opt.Simplify.
Also added initialization functions for `SimplMode` to that module.
* Moved `CoreToDo` and friends to a new module GHC.Core.Pipeline.Types
and the counting types and functions (`SimplCount` and `Tick`) to new
module GHC.Core.Opt.Stats.
* Added getter functions for the fields of `SimplMode`. The pedantic bottoms
option and the platform are retrieved from the ArityOpts and RuleOpts and the
getter functions allow us to retrieve values from `SpecEnv` without the
knowledge where the data is stored exactly.
* Moved the coercion optimization options from the top environment to
`SimplMode`. This way the values left in the top environment are those
dealing with monadic functionality, namely logging, IO related stuff and
counting. Added a note "The environments of the Simplify pass".
* Removed `CoreToDo` from GHC.Core.Lint and GHC.CoreToStg.Prep and got rid of
`CoreDoSimplify`. Pass `SimplifyOpts` in the `CoreToDo` type instead.
* Prep work before removing `InteractiveContext` from `HscEnv`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to put OtherCon unfoldings on lambda binders of workers
and sometimes also join points/specializations with with the
assumption that since the wrapper would force these arguments
once we execute the RHS they would indeed be in WHNF.
This was wrong for reasons detailed in #21472. So now we purge
evaluated unfoldings from *all* lambda binders.
This fixes #21472, but at the cost of sometimes not using as efficient a
calling convention. It can also change inlining behaviour as some
occurances will no longer look like value arguments when they did
before.
As consequence we also change how we compute CBV information for
arguments slightly. We now *always* determine the CBV convention
for arguments during tidy. Earlier in the pipeline we merely mark
functions as candidates for having their arguments treated as CBV.
As before the process is described in the relevant notes:
Note [CBV Function Ids]
Note [Attaching CBV Marks to ids]
Note [Never put `OtherCon` unfoldigns on lambda binders]
-------------------------
Metric Decrease:
T12425
T13035
T18223
T18223
T18923
MultiLayerModulesTH_OneShot
Metric Increase:
WWRec
-------------------------
|
|
|
|
|
|
|
|
| |
- Remove groupWithName (unused)
- Use the RuntimeRepType synonym where possible
- Replace getUniqueM + mkSysLocalOrCoVar with mkSysLocalOrCoVarM
No functional changes.
|
|
|
|
|
| |
We use compatibleRep to compare reps, and avoid checking functions with
levity polymorphic types because of #21399.
|
|
|
|
|
|
|
| |
This will mean T9208 when run with lint will return a lint error instead
of resulting in a panic.
Fixes #21117
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, GHC had the "let/app-invariant" which said that the RHS of a
let or the argument of an application must be of lifted type or ok for
speculation. We want this on let to freely float them around, and we
wanted that on app to freely convert between the two (e.g. in
beta-reduction or inlining).
However, the app invariant meant that simple code didn't stay simple and
this got in the way of rules matching. By removing the app invariant,
this thus fixes #20554.
The new invariant is now called "let-can-float invariant", which is
hopefully easier to guess its meaning correctly.
Dropping the app invariant means that everywhere where we effectively do
beta-reduction (in the two simplifiers, but also in `exprIsConApp_maybe`
and other innocent looking places) we now have to check if the argument
must be evaluated (unlifted and side-effecting), and analyses have to be
adjusted to the new semantics of `App`.
Also, `LetFloats` in the simplifier can now also carry such non-floating
bindings.
The fix for DmdAnal, refine by Sebastian, makes functions with unlifted
arguments strict in these arguments, which changes some signatures.
This causes some extra calls to `exprType` and `exprOkForSpeculation`,
so some perf benchmarks regress a bit (while others improve).
Metric Decrease:
T9020
Metric Increase:
LargeRecord
T12545
T15164
T16577
T18223
T5642
T9961
Co-authored-by: Sebastian Graf <sebastian.graf@kit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit alters GenStgAlt from a type synonym to a Record with field
accessors. In pursuit of #21078, this is not a required change but cleans
up several areas for nicer code in the upcoming js-backend, and in GHC
itself.
GenStgAlt: 3-tuple -> record
Stg.Utils: GenStgAlt 3-tuple -> record
Stg.Stats: StgAlt 3-tuple --> record
Stg.InferTags.Rewrite: StgAlt 3-tuple -> record
Stg.FVs: GenStgAlt 3-tuple -> record
Stg.CSE: GenStgAlt 3-tuple -> record
Stg.InferTags: GenStgAlt 3-tuple --> record
Stg.Debug: GenStgAlt 3-tuple --> record
Stg.Lift.Analysis: GenStgAlt 3-tuple --> record
Stg.Lift: GenStgAlt 3-tuple --> record
ByteCode.Instr: GenStgAlt 3-tuple --> record
Stg.Syntax: add GenStgAlt helper functions
Stg.Unarise: GenStgAlt 3-tuple --> record
Stg.BcPrep: GenStgAlt 3-tuple --> record
CoreToStg: GenStgAlt 3-tuple --> record
StgToCmm.Expr: GenStgAlt 3-tuple --> record
StgToCmm.Bind: GenStgAlt 3-tuple --> record
StgToByteCode: GenStgAlt 3-tuple --> record
Stg.Lint: GenStgAlt 3-tuple --> record
Stg.Syntax: strictify GenStgAlt
GenStgAlt: add haddock, some cleanup
fixup: remove calls to pure, single ViewPattern
StgToByteCode: use case over viewpatterns
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does three major things:
* Enforce the invariant that all strict fields must contain tagged
pointers.
* Try to predict the tag on bindings in order to omit tag checks.
* Allows functions to pass arguments unlifted (call-by-value).
The former is "simply" achieved by wrapping any constructor allocations with
a case which will evaluate the respective strict bindings.
The prediction is done by a new data flow analysis based on the STG
representation of a program. This also helps us to avoid generating
redudant cases for the above invariant.
StrictWorkers are created by W/W directly and SpecConstr indirectly.
See the Note [Strict Worker Ids]
Other minor changes:
* Add StgUtil module containing a few functions needed by, but
not specific to the tag analysis.
-------------------------
Metric Decrease:
T12545
T18698b
T18140
T18923
LargeRecord
Metric Increase:
LargeRecord
ManyAlternatives
ManyConstructors
T10421
T12425
T12707
T13035
T13056
T13253
T13253-spj
T13379
T15164
T18282
T18304
T18698a
T1969
T20049
T3294
T4801
T5321FD
T5321Fun
T783
T9233
T9675
T9961
T19695
WWRec
-------------------------
|
|
|
|
|
| |
Also derive some more instances. GHC doesn't need them, but downstream
consumers may need to e.g. put stuff in maps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #20541 by making mkTyConApp do more sharing of types.
In particular, replace
* BoxedRep Lifted ==> LiftedRep
* BoxedRep Unlifted ==> UnliftedRep
* TupleRep '[] ==> ZeroBitRep
* TYPE ZeroBitRep ==> ZeroBitType
In each case, the thing on the right is a type synonym
for the thing on the left, declared in ghc-prim:GHC.Types.
See Note [Using synonyms to compress types] in GHC.Core.Type.
The synonyms for ZeroBitRep and ZeroBitType are new, but absolutely
in the same spirit as the other ones. (These synonyms are mainly
for internal use, though the programmer can use them too.)
I also renamed GHC.Core.Ty.Rep.isVoidTy to isZeroBitTy, to be
compatible with the "zero-bit" nomenclature above. See discussion
on !6806.
There is a tricky wrinkle: see GHC.Core.Types
Note [Care using synonyms to compress types]
Compiler allocation decreases by up to 0.8%.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use DiagOpts for diagnostic options instead of directly querying
DynFlags (#17957).
Surprising performance improvements on CI:
T4801(normal) ghc/alloc 313236344.0 306515216.0 -2.1% GOOD
T9961(normal) ghc/alloc 384502736.0 380584384.0 -1.0% GOOD
ManyAlternatives(normal) ghc/alloc 797356128.0 786644928.0 -1.3%
ManyConstructors(normal) ghc/alloc 4389732432.0 4317740880.0 -1.6%
T783(normal) ghc/alloc 408142680.0 402812176.0 -1.3%
Metric Decrease:
T4801
T9961
T783
ManyAlternatives
ManyConstructors
Bump haddock submodule
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce LogFlags as a independent subset of DynFlags used for logging.
As a consequence in many places we don't have to pass both Logger and
DynFlags anymore.
The main reason for this refactoring is that I want to refactor the
systools interfaces: for now many systools functions use DynFlags both
to use the Logger and to fetch their parameters (e.g. ldInputs for the
linker). I'm interested in refactoring the way they fetch their
parameters (i.e. use dedicated XxxOpts data types instead of DynFlags)
for #19877. But if I did this refactoring before refactoring the Logger,
we would have duplicate parameters (e.g. ldInputs from DynFlags and
linkerInputs from LinkerOpts). Hence this patch first.
Some flags don't really belong to LogFlags because they are subsystem
specific (e.g. most DumpFlags). For example -ddump-asm should better be
passed in NCGConfig somehow. This patch doesn't fix this tight coupling:
the dump flags are part of the UI but they are passed all the way down
for example to infer the file name for the dumps.
Because LogFlags are a subset of the DynFlags, we must update the former
when the latter changes (not so often). As a consequence we now use
accessors to read/write DynFlags in HscEnv instead of using `hsc_dflags`
directly.
In the process I've also made some subsystems less dependent on DynFlags:
- CmmToAsm: by passing some missing flags via NCGConfig (see new fields
in GHC.CmmToAsm.Config)
- Core.Opt.*:
- by passing -dinline-check value into UnfoldingOpts
- by fixing some Core passes interfaces (e.g. CallArity, FloatIn)
that took DynFlags argument for no good reason.
- as a side-effect GHC.Core.Opt.Pipeline.doCorePass is much less
convoluted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit further expand on the design for #18516 by getting rid of
the `defaultReasonSeverity` in favour of a function called
`diagReasonSeverity` which correctly takes the `DynFlags` as input. The
idea is to compute the `Severity` and the `DiagnosticReason` of each
message "at birth", without doing any later re-classifications, which
are potentially error prone, as the `DynFlags` might evolve during the
course of the program.
In preparation for a proper refactoring, now `pprWarning` from the
Parser.Ppr module has been renamed to `mkParserWarn`, which now takes a
`DynFlags` as input.
We also get rid of the reclassification we were performing inside `printOrThrowWarnings`.
Last but not least, this commit removes the need for reclassify inside GHC.Tc.Errors,
and also simplifies the implementation of `maybeReportError`.
Update Haddock submodule
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Other than that:
* Fix T16167,json,json2,T7478,T10637 tests to reflect the introduction of
the `MessageClass` type
* Remove `makeIntoWarning`
* Remove `warningsToMessages`
* Refactor GHC.Tc.Errors
1. Refactors GHC.Tc.Errors so that we use `DiagnosticReason` for "choices"
(defer types errors, holes, etc);
2. We get rid of `reportWarning` and `reportError` in favour of a general
`reportDiagnostic`.
* Introduce `DiagnosticReason`, `Severity` is an enum: This big commit makes
`Severity` a simple enumeration, and introduces the concept of `DiagnosticReason`,
which classifies the /reason/ why we are emitting a particular diagnostic.
It also adds a monomorphic `DiagnosticMessage` type which is used for
generic messages.
* The `Severity` is computed (for now) from the reason, statically.
Later improvement will add a `diagReasonSeverity` function to compute
the `Severity` taking `DynFlags` into account.
* Rename `logWarnings` into `logDiagnostics`
* Add note and expand description of the `mkHoleError` function
|
|
|
|
|
|
| |
tuples and sums.
fixes #1257
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GHC Proposal: 0265-unlifted-datatypes.rst
Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/265
Issues: https://gitlab.haskell.org/ghc/ghc/-/issues/19523
Implementation Details: Note [Implementation of UnliftedDatatypes]
This patch introduces the `UnliftedDatatypes` extension. When this extension is
enabled, GHC relaxes the restrictions around what result kinds are allowed in
data declarations. This allows data types for which an unlifted or
levity-polymorphic result kind is inferred.
The most significant changes are in `GHC.Tc.TyCl`, where
`Note [Implementation of UnliftedDatatypes]` describes the details of the
implementation.
Fixes #19523.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `-fdistinct-constructor-tables` flag will generate a fresh info
table for the usage of any data constructor. This is useful for
debugging as now by inspecting the info table, you can determine which
usage of a constructor caused that allocation rather than the old
situation where the info table always mapped to the definition site of
the data constructor which is useless.
In conjunction with `-hi` and `-finfo-table-map` this gives a more fine
grained understanding of where constructor allocations arise from in a
program.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, the only way to override GHC's default logging
behavior was to set `log_action`, `dump_action` and `trace_action`
fields in DynFlags. This patch introduces a new Logger abstraction and
stores it in HscEnv instead.
This is part of #17957 (avoid storing state in DynFlags). DynFlags are
duplicated and updated per-module (because of OPTIONS_GHC pragma), so
we shouldn't store global state in them.
This patch also fixes a race in parallel "--make" mode which updated
the `generatedDumps` IORef concurrently.
Bump haddock submodule
The increase in MultilayerModules is tracked in #19293.
Metric Increase:
MultiLayerModules
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit boldly removes the ErrDoc and the MsgDoc from the codebase.
The former was introduced with the only purpose of classifying errors
according to their importance, but a similar result can be obtained just
by having a simple [SDoc], and placing bullets after each of them.
On top of that I have taken the perhaps controversial decision to also
banish MsgDoc, as it was merely a type alias over an SDoc and as such it wasn't
offering any extra type safety. Granted, it was perhaps making type
signatures slightly more "focused", but at the expense of cognitive
burden: if it's really just an SDoc, let's call it with its proper name.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
StgLam is used exclusively in the work of CoreToStg, but there's nothing
in the type of StgExpr that indicates this, so we're forced throughout
the Stg.* codebase to handle cases like:
case expr of
...
StgLam lam -> panic "Unexpected StgLam"
...
This patch removes the StgLam constructor from the base StgExpr so these
cases no longer need to be handled. Instead, we use a new intermediate
type in CoreToStg, PreStgRhs, to represent the RHS expression of a
binding.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code that converts promoted tuple data constructors to
`IfaceType`s in `GHC.CoreToIface` was using `isTupleDataCon`, which
conflates boxed and unboxed tuple data constructors. To avoid this,
this patch introduces `isBoxedTupleDataCon`, which is like
`isTupleDataCon` but only works for _boxed_ tuple data constructors.
While I was in town, I was horribly confused by the fact that there
were separate functions named `isUnboxedTupleCon` and
`isUnboxedTupleTyCon` (similarly, `isUnboxedSumCon` and
`isUnboxedSumTyCon`). It turns out that the former only works for
data constructors, despite its very general name! I opted to rename
`isUnboxedTupleCon` to `isUnboxedTupleDataCon` (similarly, I renamed
`isUnboxedSumCon` to `isUnboxedSumDataCon`) to avoid this potential
confusion, as well as to be more consistent with
the naming convention I used for `isBoxedTupleDataCon`.
Fixes #18644.
|
|
|
|
|
|
|
|
|
| |
* add StgPprOpts datatype
* remove Outputable instances for types that need `StgPprOpts` to be
pretty-printed and explicitly call type specific ppr functions
* add default `panicStgPprOpts` for panic messages (when it's not
convenient to thread StgPprOpts or DynFlags down to the ppr function
call)
|
|
|
|
| |
Use `withPprStyle` instead to apply a specific style to a SDoc.
|
|
|
|
|
|
|
|
|
|
|
| |
If `-dppr-debug` is set, then PprUser and PprDump styles are silently
replaced with PprDebug style. This was done in `mkUserStyle` and
`mkDumpStyle` smart constructors. As a consequence they needed a
DynFlags parameter.
Now we keep the original PprUser and PprDump styles until they are used
to create an `SDocContext`. I.e. the substitution is only performed in
`initSDocContext`.
|
|
|
|
|
|
|
| |
Introduce GHC.Unit.* hierarchy for everything concerning units, packages
and modules.
Update Haddock submodule
|
|
|
|
|
|
|
| |
Update Haddock submodule
Metric Increase:
haddock.compiler
|
|
|
|
|
|
|
| |
Update Haddock submodule
Metric Increase:
haddock.compiler
|
|
|
|
| |
Update submodule: haddock
|
|
|
|
| |
Update haddock submodule
|
|
|
|
| |
submodule updates: nofib, haddock
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes all CafInfo predictions and various hacks to preserve
predicted CafInfos from the compiler and assigns final CafInfos to
interface Ids after code generation. SRT analysis is extended to support
static data, and Cmm generator is modified to allow generating
static_link fields after SRT analysis.
This also fixes `-fcatch-bottoms`, which introduces error calls in case
expressions in CorePrep, which runs *after* CoreTidy (which is where we
decide on CafInfos) and turns previously non-CAFFY things into CAFFY.
Fixes #17648
Fixes #9718
Evaluation
==========
NoFib
-----
Boot with: `make boot mode=fast`
Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1`
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS -0.0% 0.0% -0.0% -0.0% -0.0%
CSD -0.0% 0.0% -0.0% -0.0% -0.0%
FS -0.0% 0.0% -0.0% -0.0% -0.0%
S -0.0% 0.0% -0.0% -0.0% -0.0%
VS -0.0% 0.0% -0.0% -0.0% -0.0%
VSD -0.0% 0.0% -0.0% -0.0% -0.5%
VSM -0.0% 0.0% -0.0% -0.0% -0.0%
anna -0.1% 0.0% -0.0% -0.0% -0.0%
ansi -0.0% 0.0% -0.0% -0.0% -0.0%
atom -0.0% 0.0% -0.0% -0.0% -0.0%
awards -0.0% 0.0% -0.0% -0.0% -0.0%
banner -0.0% 0.0% -0.0% -0.0% -0.0%
bernouilli -0.0% 0.0% -0.0% -0.0% -0.0%
binary-trees -0.0% 0.0% -0.0% -0.0% -0.0%
boyer -0.0% 0.0% -0.0% -0.0% -0.0%
boyer2 -0.0% 0.0% -0.0% -0.0% -0.0%
bspt -0.0% 0.0% -0.0% -0.0% -0.0%
cacheprof -0.0% 0.0% -0.0% -0.0% -0.0%
calendar -0.0% 0.0% -0.0% -0.0% -0.0%
cichelli -0.0% 0.0% -0.0% -0.0% -0.0%
circsim -0.0% 0.0% -0.0% -0.0% -0.0%
clausify -0.0% 0.0% -0.0% -0.0% -0.0%
comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0%
compress -0.0% 0.0% -0.0% -0.0% -0.0%
compress2 -0.0% 0.0% -0.0% -0.0% -0.0%
constraints -0.0% 0.0% -0.0% -0.0% -0.0%
cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0%
cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0%
cse -0.0% 0.0% -0.0% -0.0% -0.0%
digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0%
digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0%
dom-lt -0.0% 0.0% -0.0% -0.0% -0.0%
eliza -0.0% 0.0% -0.0% -0.0% -0.0%
event -0.0% 0.0% -0.0% -0.0% -0.0%
exact-reals -0.0% 0.0% -0.0% -0.0% -0.0%
exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0%
expert -0.0% 0.0% -0.0% -0.0% -0.0%
fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0%
fasta -0.0% 0.0% -0.0% -0.0% -0.0%
fem -0.0% 0.0% -0.0% -0.0% -0.0%
fft -0.0% 0.0% -0.0% -0.0% -0.0%
fft2 -0.0% 0.0% -0.0% -0.0% -0.0%
fibheaps -0.0% 0.0% -0.0% -0.0% -0.0%
fish -0.0% 0.0% -0.0% -0.0% -0.0%
fluid -0.1% 0.0% -0.0% -0.0% -0.0%
fulsom -0.0% 0.0% -0.0% -0.0% -0.0%
gamteb -0.0% 0.0% -0.0% -0.0% -0.0%
gcd -0.0% 0.0% -0.0% -0.0% -0.0%
gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0%
genfft -0.0% 0.0% -0.0% -0.0% -0.0%
gg -0.0% 0.0% -0.0% -0.0% -0.0%
grep -0.0% 0.0% -0.0% -0.0% -0.0%
hidden -0.0% 0.0% -0.0% -0.0% -0.0%
hpg -0.1% 0.0% -0.0% -0.0% -0.0%
ida -0.0% 0.0% -0.0% -0.0% -0.0%
infer -0.0% 0.0% -0.0% -0.0% -0.0%
integer -0.0% 0.0% -0.0% -0.0% -0.0%
integrate -0.0% 0.0% -0.0% -0.0% -0.0%
k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0%
kahan -0.0% 0.0% -0.0% -0.0% -0.0%
knights -0.0% 0.0% -0.0% -0.0% -0.0%
lambda -0.0% 0.0% -0.0% -0.0% -0.0%
last-piece -0.0% 0.0% -0.0% -0.0% -0.0%
lcss -0.0% 0.0% -0.0% -0.0% -0.0%
life -0.0% 0.0% -0.0% -0.0% -0.0%
lift -0.0% 0.0% -0.0% -0.0% -0.0%
linear -0.1% 0.0% -0.0% -0.0% -0.0%
listcompr -0.0% 0.0% -0.0% -0.0% -0.0%
listcopy -0.0% 0.0% -0.0% -0.0% -0.0%
maillist -0.0% 0.0% -0.0% -0.0% -0.0%
mandel -0.0% 0.0% -0.0% -0.0% -0.0%
mandel2 -0.0% 0.0% -0.0% -0.0% -0.0%
mate -0.0% 0.0% -0.0% -0.0% -0.0%
minimax -0.0% 0.0% -0.0% -0.0% -0.0%
mkhprog -0.0% 0.0% -0.0% -0.0% -0.0%
multiplier -0.0% 0.0% -0.0% -0.0% -0.0%
n-body -0.0% 0.0% -0.0% -0.0% -0.0%
nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0%
para -0.0% 0.0% -0.0% -0.0% -0.0%
paraffins -0.0% 0.0% -0.0% -0.0% -0.0%
parser -0.1% 0.0% -0.0% -0.0% -0.0%
parstof -0.1% 0.0% -0.0% -0.0% -0.0%
pic -0.0% 0.0% -0.0% -0.0% -0.0%
pidigits -0.0% 0.0% -0.0% -0.0% -0.0%
power -0.0% 0.0% -0.0% -0.0% -0.0%
pretty -0.0% 0.0% -0.3% -0.4% -0.4%
primes -0.0% 0.0% -0.0% -0.0% -0.0%
primetest -0.0% 0.0% -0.0% -0.0% -0.0%
prolog -0.0% 0.0% -0.0% -0.0% -0.0%
puzzle -0.0% 0.0% -0.0% -0.0% -0.0%
queens -0.0% 0.0% -0.0% -0.0% -0.0%
reptile -0.0% 0.0% -0.0% -0.0% -0.0%
reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0%
rewrite -0.0% 0.0% -0.0% -0.0% -0.0%
rfib -0.0% 0.0% -0.0% -0.0% -0.0%
rsa -0.0% 0.0% -0.0% -0.0% -0.0%
scc -0.0% 0.0% -0.3% -0.5% -0.4%
sched -0.0% 0.0% -0.0% -0.0% -0.0%
scs -0.0% 0.0% -0.0% -0.0% -0.0%
simple -0.1% 0.0% -0.0% -0.0% -0.0%
solid -0.0% 0.0% -0.0% -0.0% -0.0%
sorting -0.0% 0.0% -0.0% -0.0% -0.0%
spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0%
sphere -0.0% 0.0% -0.0% -0.0% -0.0%
symalg -0.0% 0.0% -0.0% -0.0% -0.0%
tak -0.0% 0.0% -0.0% -0.0% -0.0%
transform -0.0% 0.0% -0.0% -0.0% -0.0%
treejoin -0.0% 0.0% -0.0% -0.0% -0.0%
typecheck -0.0% 0.0% -0.0% -0.0% -0.0%
veritas -0.0% 0.0% -0.0% -0.0% -0.0%
wang -0.0% 0.0% -0.0% -0.0% -0.0%
wave4main -0.0% 0.0% -0.0% -0.0% -0.0%
wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0%
wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0%
x2n1 -0.0% 0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Min -0.1% 0.0% -0.3% -0.5% -0.5%
Max -0.0% 0.0% -0.0% -0.0% -0.0%
Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
circsim -0.1% 0.0% -0.0% -0.0% -0.0%
constraints -0.0% 0.0% -0.0% -0.0% -0.0%
fibheaps -0.0% 0.0% -0.0% -0.0% -0.0%
gc_bench -0.0% 0.0% -0.0% -0.0% -0.0%
hash -0.0% 0.0% -0.0% -0.0% -0.0%
lcss -0.0% 0.0% -0.0% -0.0% -0.0%
power -0.0% 0.0% -0.0% -0.0% -0.0%
spellcheck -0.0% 0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Min -0.1% 0.0% -0.0% -0.0% -0.0%
Max -0.0% 0.0% -0.0% -0.0% -0.0%
Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0%
Manual inspection of programs in testsuite/tests/programs
---------------------------------------------------------
I built these programs with a bunch of dump flags and `-O` and compared
STG, Cmm, and Asm dumps and file sizes.
(Below the numbers in parenthesis show number of modules in the program)
These programs have identical compiler (same .hi and .o sizes, STG, and
Cmm and Asm dumps):
- Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3),
andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2),
jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4),
jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1),
bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1),
strict_anns (1), thurston-module-arith (2), okeefe_neural (1),
joao-circular (6), 10queens (1)
Programs with different compiler outputs:
- jl_defaults (1): For some reason GHC HEAD marks a lot of top-level
`[Int]` closures as CAFFY for no reason. With this patch we no longer
make them CAFFY and generate less SRT entries. For some reason Main.o
is slightly larger with this patch (1.3%) and the executable sizes are
the same. (I'd expect both to be smaller)
- launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked
as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the
executable sizes are the same.
- galois_raytrace (13): Differences are in the Parse module. There are a
lot, but some of the changes are caused by the fact that for some
reason (I think a bug) GHC HEAD marks the dictionary for `Functor
Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the
same.
- north_array: We now generate less SRT entries because some of array
primops used in this program like `NewArrayOp` get eliminated during
Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24%
larger (9224 bytes from 9000 bytes), executable sizes are the same.
- seward-space-leak: Difference in this program is better shown by this
smaller example:
module Lib where
data CDS
= Case [CDS] [(Int, CDS)]
| Call CDS CDS
instance Eq CDS where
Case sels1 rets1 == Case sels2 rets2 =
sels1 == sels2 && rets1 == rets2
Call a1 b1 == Call a2 b2 =
a1 == a2 && b1 == b2
_ == _ =
False
In this program GHC HEAD builds a new SRT for the recursive group of
`(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==`
in its SRT field, and `==` uses the SRT object as its SRT. With this
patch we use the closure for `/=` as the SRT and add `==` there. Then
`/=` gets an empty SRT field and `==` points to `/=` in its SRT
field.
This change looks fine to me.
Main.o gets 0.07% larger, executable sizes are identical.
head.hackage
------------
head.hackage's CI script builds 428 packages from Hackage using this
patch with no failures.
Compiler performance
--------------------
The compiler perf tests report that the compiler allocates slightly more
(worst case observed so far is 4%). However most programs in the test
suite are small, single file programs. To benchmark compiler performance
on something more realistic I build Cabal (the library, 236 modules)
with different optimisation levels. For the "max residency" row I run
GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows
are generated with just `-s`. (This is because `-i0` causes running GC
much more frequently and as a result "bytes copied" gets inflated by
more than 25x in some cases)
* -O0
| | GHC HEAD | This MR | Diff |
| --------------- | -------------- | -------------- | ------ |
| Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% |
| Bytes copied | 4,926,037,184 | 4,990,638,760 | +1.31% |
| Max residency | 421,225,624 | 424,324,264 | +0.73% |
* -O1
| | GHC HEAD | This MR | Diff |
| --------------- | --------------- | --------------- | ------ |
| Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% |
| Bytes copied | 26,943,452,560 | 27,089,972,296 | +0.54% |
| Max residency | 982,643,440 | 991,663,432 | +0.91% |
* -O2
| | GHC HEAD | This MR | Diff |
| --------------- | --------------- | --------------- | ------ |
| Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% |
| Bytes copied | 37,044,237,616 | 36,121,690,472 | -2.49% |
| Max residency | 1,071,600,328 | 1,086,396,256 | +1.38% |
Extra compiler allocations
--------------------------
Runtime allocations of programs are as reported above (NoFib section).
The compiler now allocates more than before. Main source of allocation
in this patch compared to base commit is the new SRT algorithm
(GHC.Cmm.Info.Build). Below is some of the extra work we do with this
patch, numbers generated by profiled stage 2 compiler when building a
pathological case (the test 'ManyConstructors') with '-O2':
- We now sort the final STG for a module, which means traversing the
entire program, generating free variable set for each top-level
binding, doing SCC analysis, and re-ordering the program. In
ManyConstructors this step allocates 97,889,952 bytes.
- We now do SRT analysis on static data, which in a program like
ManyConstructors causes analysing 10,000 bindings that we would
previously just skip. This step allocates 70,898,352 bytes.
- We now maintain an SRT map for the entire module as we compile Cmm
groups:
data ModuleSRTInfo = ModuleSRTInfo
{ ...
, moduleSRTMap :: SRTMap
}
(SRTMap is just a strict Map from the 'containers' library)
This map gets an entry for most bindings in a module (exceptions are
THUNKs and CAFFY static functions). For ManyConstructors this map
gets 50015 entries.
- Once we're done with code generation we generate a NameSet from SRTMap
for the non-CAFFY names in the current module. This set gets the same
number of entries as the SRTMap.
- Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using
the NameSet generated in the previous step. This usually does the
least amount of allocation among the work listed here.
Only place with this patch where we do less work in the CAF analysis in
the tidying pass (CoreTidy). However that doesn't save us much, as the
pass still needs to traverse the whole program and update IdInfos for
other reasons. Only thing we don't here do is the `hasCafRefs` pass over
the RHS of bindings, which is a stateless pass that returns a boolean
value, so it doesn't allocate much.
(Metric changes blow are all increased allocations)
Metric changes
--------------
Metric Increase:
ManyAlternatives
ManyConstructors
T13035
T14683
T1969
T9961
|
|
|