| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
- Remove unused mkWildEvBinder
- Use typeTypeOrConstraint - more symmetric and asserts that
that the type is Type or Constraint
- Fix escape sequences in Python; they raise a deprecation warning
with -Wdefault
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I had assumed that wrappers were not inlined in interactive mode.
Meaning we would always execute the compiled wrapper which properly
takes care of upholding the strict field invariant.
This turned out to be wrong. So instead we now run tag inference even
when we generate bytecode. In that case only for correctness not
performance reasons although it will be still beneficial for runtime
in some cases.
I further fixed a bug where GHCi didn't tag nullary constructors
properly when used as arguments. Which caused segfaults when calling
into compiled functions which expect the strict field invariant to
be upheld.
Fixes #22042 and #21083
-------------------------
Metric Increase:
T4801
Metric Decrease:
T13035
-------------------------
|
|
|
|
|
|
|
|
|
| |
The function GHC.Stg.InferTags.Rewrite.isTagged can be given
the Id of a join point, which might be representation polymorphic.
This would cause the call to isUnliftedType to crash. It's better
to use typeLevity_maybe instead.
Fixes #22212
|
|
|
|
|
|
|
|
|
|
| |
For an expression like:
case x of y
Con z -> z
If we also retain the tag sig for z we can generate code to immediately return
it rather than calling out to stg_ap_0_fast.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does three major things:
* Enforce the invariant that all strict fields must contain tagged
pointers.
* Try to predict the tag on bindings in order to omit tag checks.
* Allows functions to pass arguments unlifted (call-by-value).
The former is "simply" achieved by wrapping any constructor allocations with
a case which will evaluate the respective strict bindings.
The prediction is done by a new data flow analysis based on the STG
representation of a program. This also helps us to avoid generating
redudant cases for the above invariant.
StrictWorkers are created by W/W directly and SpecConstr indirectly.
See the Note [Strict Worker Ids]
Other minor changes:
* Add StgUtil module containing a few functions needed by, but
not specific to the tag analysis.
-------------------------
Metric Decrease:
T12545
T18698b
T18140
T18923
LargeRecord
Metric Increase:
LargeRecord
ManyAlternatives
ManyConstructors
T10421
T12425
T12707
T13035
T13056
T13253
T13253-spj
T13379
T15164
T18282
T18304
T18698a
T1969
T20049
T3294
T4801
T5321FD
T5321Fun
T783
T9233
T9675
T9961
T19695
WWRec
-------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables worker/wrapper for nested constructed products, as described
in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
since !5338. CPR analysis already handles applications in batches since !5753.
This patch just needs to flip a few more switches:
1. In `cprTransformDataConWork`, we need to look at the field expressions
and their `CprType`s to see whether the evaluation of the expressions
terminates quickly (= is in HNF) or if they are put in strict fields.
If that is the case, then we retain their CPR info and may unbox nestedly
later on. More details in `Note [Nested CPR]`.
2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
fields to the `Unbox`.
4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
`cprTransformDataConWork` for workers and treat wrappers by analysing their
unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
5. I deactivated worker/wrappering for recursive DataCons and wrote a function
`isRecDataCon` to detect them. We really don't want to give `repeat` or
`replicate` the Nested CPR property.
See Note [CPR for recursive data structures] for which kind of recursive
DataCons we target.
6. Fix a couple of tests and their outputs.
I also documented that CPR can destroy sharing and lead to asymptotic increase
in allocations (which is tracked by #13331/#19326) in
`Note [CPR for data structures can destroy sharing]`.
Nofib results:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
ben-raytrace -3.1% -0.4%
binary-trees +0.8% -2.9%
digits-of-e2 +5.8% +1.2%
event +0.8% -2.1%
fannkuch-redux +0.0% -1.4%
fish 0.0% -1.5%
gamteb -1.4% -0.3%
mkhprog +1.4% +0.8%
multiplier +0.0% -1.9%
pic -0.6% -0.1%
reptile -20.9% -17.8%
wave4main +4.8% +0.4%
x2n1 -100.0% -7.6%
--------------------------------------------------------------------------------
Min -95.0% -17.8%
Max +5.8% +1.2%
Geometric Mean -2.9% -0.4%
```
The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
regress because of that. Ultimately there are no great improvements due to
Nested CPR alone, but at least it's a win.
Binary sizes decrease by 0.6%.
There are a significant number of metric decreases. The most notable ones (>1%):
```
ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2%
ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8%
MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1%
PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8%
PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8%
PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9%
T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0%
T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9%
T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1%
T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5%
T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7%
T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD
T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD
T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4%
T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD
T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1%
T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD
T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7%
T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2%
T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0%
T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2%
T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD
T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5%
T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6%
T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2%
T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8%
T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3%
T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1%
T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD
T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD
T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0%
T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0%
T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5%
T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4%
T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0%
T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0%
T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD
T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD
T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5%
T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD
T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD
T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD
T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD
TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0%
WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6%
T9203(normal) run/alloc 107284064.0 102881832.0 -4.1%
haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1%
haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1%
haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0%
```
The biggest exception to the rule is T13701 which seems to fluctuate as usual
(not unlike T12545). T14697 has a similar quality, being a generated
multi-module test. T5837 is small enough that it similarly doesn't measure
anything significant besides module loading overhead.
T13253 simply does one additional round of Simplification due to Nested CPR.
There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
(@mpickering and I) were simply unable to reproduce locally. @mpickering tried
to run the CI script in a local Docker container and actually found that T9198
and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
I did not experience such increases. Let's not get hung up on these regression
tests, they were meant to test for asymptotic regressions.
The build-cabal test improves by 1.2% in -O0.
Metric Increase:
T10421
T12234
T12545
T13035
T13056
T13701
T14697
T18923
T5837
T9198
Metric Decrease:
ManyConstructors
T12545
T12707
T13056
T14683
T16577
T18223
T1969
T3294
T9203
T9233
T9675
T9872a
T9872b
T9872c
T9961
TcPlugin_RewritePerf
|
|
|
|
|
|
|
|
| |
This patch fixes #19717, a long-standing bug in CSE for STG, which
led to a stupid loss of CSE in some situations.
It's explained in Note [Trivial case scrutinee], which I have
substantially extended.
|
|
|
|
|
|
|
| |
Introduce GHC.Unit.* hierarchy for everything concerning units, packages
and modules.
Update Haddock submodule
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trac #10069 revealed that small NOINLINE functions didn't get split
into worker and wrapper. This was due to `certainlyWillInline`
saying that any unfoldings with a guidance of `UnfWhen` inline
unconditionally. That isn't the case for NOINLINE functions, so we
catch this case earlier now.
Nofib results:
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
fannkuch-redux -0.3% 0.0%
gg +0.0% +0.1%
maillist -0.2% -0.2%
minimax 0.0% -0.8%
--------------------------------------------------------------------------------
Min -0.3% -0.8%
Max +0.0% +0.1%
Geometric Mean -0.0% -0.0%
Fixes #10069.
-------------------------
Metric Increase:
T9233
-------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes the following improvement:
- Automatically records test metrics (per test environment) so that
the programmer need not supply nor update expected values in *.T
files.
- On expected metric changes, the programmer need only indicate the
direction of change in the git commit message.
- Provides a simple python tool "perf_notes.py" to compare metrics
over time.
Issues:
- Using just the previous commit allows performance to drift with each
commit.
- Currently we allow drift as we have a preference for minimizing
false positives.
- Some possible alternatives include:
- Use metrics from a fixed commit per test: the last commit that
allowed a change in performance (else the oldest metric)
- Or use some sort of aggregate since the last commit that allowed
a change in performance (else all available metrics)
- These alternatives may result in a performance issue (with the
test driver) having to heavily search git commits/notes.
- Run locally, performance tests will trivially pass unless the tests
were run locally on the previous commit. This is often not the case
e.g. after pulling recent changes.
Previously, *.T files contain statements such as:
```
stats_num_field('peak_megabytes_allocated', (2, 1))
compiler_stats_num_field('bytes allocated',
[(wordsize(64), 165890392, 10)])
```
This required the programmer to give the expected values and a tolerance
deviation (percentage). With this patch, the above statements are
replaced with:
```
collect_stats('peak_megabytes_allocated', 5)
collect_compiler_stats('bytes allocated', 10)
```
So that programmer must only enter which metrics to test and a tolerance
deviation. No expected value is required. CircleCI will then run the
tests per test environment and record the metrics to a git note for that
commit and push them to the git.haskell.org ghc repo. Metrics will be
compared to the previous commit. If they are different by the tolerance
deviation from the *.T file, then the corresponding test will fail. By
adding to the git commit message e.g.
```
# Metric (In|De)crease <metric(s)> <options>: <tests>
Metric Increase ['bytes allocated', 'peak_megabytes_allocated'] \
(test_env='linux_x86', way='default'):
Test012, Test345
Metric Decrease 'bytes allocated':
Test678
Metric Increase:
Test711
```
This will allow the noted changes (letting the test pass). Note that by
omitting metrics or options, the change will apply to all possible
metrics/options (i.e. in the above, an increase for all metrics in all
test environments is allowed for Test711)
phabricator will use the message in the description
Reviewers: bgamari, hvr
Reviewed By: bgamari
Subscribers: rwbarton, carter
GHC Trac Issues: #12758
Differential Revision: https://phabricator.haskell.org/D5059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given two unboxed sum terms:
(# 1 | #) :: (# Int | Int# #)
(# 1 | #) :: (# Int | Int #)
These two terms are not equal as they unarise to different unboxed
tuples. However StgCse was thinking that these are equal, and replacing
one of these with a binder to the other.
To not deal with unboxed sums in StgCse we now do it after unarise. For
StgCse to maintain post-unarise invariants we factor-out case binder
in-scopeness check to `stgCaseBndrInScope` and use it in StgCse.
Also did some refactoring in SimplStg.
Another way to fix this would be adding a special case in StgCse to not
bring unboxed sum binders in scope:
diff --git a/compiler/simplStg/StgCse.hs
b/compiler/simplStg/StgCse.hs
index 6c740ca4cb..93a0f8f6ad 100644
--- a/compiler/simplStg/StgCse.hs
+++ b/compiler/simplStg/StgCse.hs
@@ -332,7 +332,11 @@ stgCseExpr env (StgLetNoEscape binds body)
stgCseAlt :: CseEnv -> OutId -> InStgAlt -> OutStgAlt
stgCseAlt env case_bndr (DataAlt dataCon, args, rhs)
= let (env1, args') = substBndrs env args
- env2 = addDataCon case_bndr dataCon (map StgVarArg
args') env1
+ env2
+ | isUnboxedSumCon dataCon
+ = env1
+ | otherwise
+ = addDataCon case_bndr dataCon (map StgVarArg args')
env1
-- see note [Case 2: CSEing case binders]
rhs' = stgCseExpr env2 rhs
in (DataAlt dataCon, args', rhs')
I think this patch seems better in that it doesn't add a special case to
StgCse.
Test Plan:
Validate.
I tried to come up with a minimal example but failed. I thought a simple
program like
data T = T (# Int | Int #) (# Int# | Int #)
case T (# 1 | #) (# 1 | #) of ...
should be enough to trigger this bug, but for some reason StgCse
doesn't do
anything on this program.
Reviewers: simonpj, bgamari
Reviewed By: simonpj
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #15300
Differential Revision: https://phabricator.haskell.org/D4962
|
|
|
|
|
|
| |
as proposed in #13588.
Differential Revision: https://phabricator.haskell.org/D3467
|
| |
|
| |
|
|
|
|
| |
which counts allocations instead of observing recomputation directly.
|
|
|
|
|
|
|
|
|
| |
as they might be marked as one-shot, and suddenly we’d evaluate them
multiple times. This came up in #13536 (test cases included).
The solution was layed out by SPJ in ticket:13536#comment:12.
Differential Revision: https://phabricator.haskell.org/D3437
|
|
This CSE pass only targets data constructor applications. This is
probably the best we can do, as function calls and primitive operations
might have side-effects.
Introduces the flag -fstg-cse, enabled by default with -O for now. It
might also be a good candiate for -O2.
Differential Revision: https://phabricator.haskell.org/D2871
|