| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Addressing review comments on D4637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The idea here is to save a little code size and some work in the GC,
by collapsing FUN_STATIC closures and their SRTs.
This is (4) in a series; see D4632 for more details.
There's a tradeoff here: more complexity in the compiler in exchange
for a modest code size reduction (probably around 0.5%).
Results:
* GHC binary itself (statically linked) is 1% smaller
* -0.2% binary sizes in nofib (-0.5% module sizes)
Full nofib results comparing D4634 with this: P177 (ignore runtimes,
these aren't stable on my laptop)
Test Plan: validate, nofib
Reviewers: bgamari, niteria, simonpj, erikd
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Previously we would hvae a single big table of pointers per module,
with a set of bitmaps to reference entries within it. The new
representation is identical to a static constructor, which is much
simpler for the GC to traverse, and we get to remove the complicated
bitmap-traversal code from the GC.
- Rewrite all the code to generate SRTs in CmmBuildInfoTables, and
document it much better (see Note [SRTs]). This has been something
I've wanted to do since we moved to the new code generator, I
finally had the opportunity to finish it while on a transatlantic
flight recently :)
There are a series of 4 diffs:
1. D4632 (this one), which does the bulk of the changes
2. D4633 which adds support for smaller `CmmLabelDiffOff` constants
3. D4634 which takes advantage of D4632 and D4633 to save a word in
info tables that have an SRT on x86_64. This is where most of the
binary size improvement comes from.
4. D4637 which makes a further optimisation to merge some SRTs with
static FUN closures. This adds some complexity and the benefits
are fairly modest, so it's not clear yet whether we should do this.
Results (after (3), on x86_64)
- GHC itself (staticaly linked) is 5.2% smaller
- -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
- I measured the overhead of traversing all the static objects in a
major GC in GHC itself by doing `replicateM_ 1000 performGC` as the
first thing in `Main.main`. The new version was 5-10% faster, but
the results did vary quite a bit.
- I'm not sure if there's a compile-time difference, the results are
too unreliable.
Test Plan: validate
Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4632
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'.
I found 'plusWord2#' while implementing this, which both lacks
documentation and has a slightly different specification than
'addWordC#', which means the generic implementation is unnecessarily
complex.
While I was at it, I also added lacking meta-information on PrimOps
and refactored 'subWordC#'s generic implementation to be branchless.
Reviewers: bgamari, simonmar, jrtc27, dfeuer
Reviewed By: bgamari, dfeuer
Subscribers: dfeuer, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4592
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: dfeuer, rwbarton, thomie, carter
GHC Trac Issues: #4442
Differential Revision: https://phabricator.haskell.org/D4488
|
|
|
|
|
|
|
|
|
| |
This subtle patch fixes Trac #5129 (again; comment:20
and following).
I took the opportunity to document seq# properly; see
Note [seq# magic] in PrelRules, and Note [seq# and expr_ok]
in CoreUtils.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
get/setAllocationCounter didn't take into account allocations in the
current block. This was known at the time, but it turns out to be
important to have more accuracy when using these in a fine-grained
way.
Test Plan:
New unit test to test incrementally larger allocaitons. Before I got
results like this:
```
+0
+0
+0
+0
+0
+4096
+0
+0
+0
+0
+0
+4064
+0
+0
+4088
+4056
+0
+0
+0
+4088
+4096
+4056
+4096
```
Notice how the results aren't always monotonically increasing. After
this patch:
```
+344
+416
+488
+560
+632
+704
+776
+848
+920
+992
+1064
+1136
+1208
+1280
+1352
+1424
+1496
+1568
+1640
+1712
+1784
+1856
+1928
+2000
+2072
+2144
```
Reviewers: hvr, erikd, simonmar, jrtc27, trommler
Reviewed By: simonmar
Subscribers: trommler, jrtc27, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4363
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes a bunch of unnecessary includes of `HsVersions.h` along
with unnecessary CPP (e.g., due to checking for DEBUG which can be
achieved by looking at `debugIsOn`)
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4462
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was broken by D3746 and/or D3809, but unfortunately we didn't
notice because CI at the time wasn't building the profiling way.
Test Plan:
```
cd testsuite/test/profiling/should_run
make WAY=ghci-ext-prof
```
Reviewers: bgamari, michalt, hvr, erikd
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #14705
Differential Revision: https://phabricator.haskell.org/D4437
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pattern `threadCapability =<< myThreadId` is used a lot in code
that uses `hs_try_putmvar`, I want to make it cheaper.
Test Plan: validate
Reviewers: bgamari, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4381
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: bgamari, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4380
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an obvious optimisation whose overhead is neglectable but
which significantly simplifies the common uses of `compareByteArrays#`
which would otherwise require to make *careful* use of
`reallyUnsafePtrEquality#` or (equally fragile) `byteArrayContents#`
which can result in less optimal assembler code being generated.
Test Plan: carefully examined generated cmm/asm code; validate via phab
Reviewers: alexbiehl, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4319
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the bit deposit and extraction operations provided
by the BMI and BMI2 instruction set extensions on modern amd64 machines.
Implement x86 code generator for pdep and pext. Properly initialise
bmiVersion field.
pdep and pext test cases
Fix pattern match for pdep and pext instructions
Fix build of pdep and pext code for 32-bit architectures
Test Plan: Validate
Reviewers: austin, simonmar, bgamari, angerman
Reviewed By: bgamari
Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy
GHC Trac Issues: #14206
Differential Revision: https://phabricator.haskell.org/D4236
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: alexbiehl, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4309
|
|
|
|
| |
This reverts commit a1a689dda48113f3735834350fb562bb1927a633.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
get/setAllocationCounter didn't take into account allocations in the
current block. This was known at the time, but it turns out to be
important to have more accuracy when using these in a fine-grained
way.
Test Plan:
New unit test to test incrementally larger allocaitons. Before I got
results like this:
```
+0
+0
+0
+0
+0
+4096
+0
+0
+0
+0
+0
+4064
+0
+0
+4088
+4056
+0
+0
+0
+4088
+4096
+4056
+4096
```
Notice how the results aren't always monotonically increasing. After
this patch:
```
+344
+416
+488
+560
+632
+704
+776
+848
+920
+992
+1064
+1136
+1208
+1280
+1352
+1424
+1496
+1568
+1640
+1712
+1784
+1856
+1928
+2000
+2072
+2144
```
Reviewers: niteria, bgamari, hvr, erikd
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4288
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up after faf60e85 - Make tagForCon non-linear.
On the mailing list @simonpj suggested to solve the
linear behavior by caching the sizes.
Test Plan: ./validate
Reviewers: simonpj, simonmar, bgamari, austin
Reviewed By: simonpj
Subscribers: carter, goldfire, rwbarton, thomie, simonpj
Differential Revision: https://phabricator.haskell.org/D4131
|
| |
|
| |
|
|
|
|
|
|
| |
This broke the 32-bit build.
This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the bit deposit and extraction operations provided
by the BMI and BMI2 instruction set extensions on modern amd64 machines.
Test Plan: Validate
Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd
Reviewed By: bgamari
Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie
GHC Trac Issues: #14206
Differential Revision: https://phabricator.haskell.org/D4063
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is another step for fixing #13825 and is based on D38 by Simon
Marlow.
The change allows storing multiple constructor fields within the same
word. This currently applies only to `Float`s, e.g.,
```
data Foo = Foo {-# UNPACK #-} !Float {-# UNPACK #-} !Float
```
on 64-bit arch, will now store both fields within the same constructor
word. For `WordX/IntX` we'll need to introduce new primop types.
Main changes:
- We now use sizes in bytes when we compute the offsets for
constructor fields in `StgCmmLayout` and introduce padding if
necessary (word-sized fields are still word-aligned)
- `ByteCodeGen` had to be updated to correctly construct the data
types. This required some new bytecode instructions to allow pushing
things that are not full words onto the stack (and updating
`Interpreter.c`). Note that we only use the packed stuff when
constructing data types (i.e., for `PACK`), in all other cases the
behavior should not change.
- `RtClosureInspect` was changed to handle the new layout when
extracting subterms. This seems to be used by things like `:print`.
I've also added a test for this.
- I deviated slightly from Simon's approach and use `PrimRep` instead
of `ArgRep` for computing the size of fields. This seemed more
natural and in the future we'll probably want to introduce new
primitive types (e.g., `Int8#`) and `PrimRep` seems like a better
place to do that (where we already have `Int64Rep` for example).
`ArgRep` on the other hand seems to be more focused on calling
functions.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar, austin, hvr, goldfire, erikd
Reviewed By: bgamari
Subscribers: maoe, rwbarton, thomie
GHC Trac Issues: #13825
Differential Revision: https://phabricator.haskell.org/D3809
|
|
|
|
|
|
|
|
|
|
|
|
| |
Depends on D4090
Reviewers: austin, bgamari, erikd, simonmar, alexbiehl
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4091
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Computing the number of constructors for TyCon is linear
in the number of constructors.
That's wasteful if all you want to check is if that
number is smaller than what fits in tag bits
(usually 8 things).
What this change does is to use a function that can
determine the ineqaulity without computing the size.
This improves compile time on a module with a
data type that has 10k constructors.
The variance in total time is (suspiciously) high,
but going by the best of 3 the numbers are 8.186s vs 7.511s.
For 1000 constructors the difference isn't noticeable:
0.646s vs 0.624s.
The hot spots were cgDataCon and cgEnumerationTyCon
where tagForCon is called in a loop.
One alternative would be to pass down the size.
Test Plan: harbormaster
Reviewers: bgamari, simonmar, austin
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4116
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The simplifier can simplify this without any trouble. Moreover, the
unboxed tuples cause bootstrapping issues due #14123.
I also went ahead and inlined a few definitions into the Monad instance.
Test Plan: Validate
Reviewers: austin, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4026
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously due to #12759 we disabled PIE support entirely. However, this
breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE,
allowing the user to build PIEs.
Test Plan: Validate
Reviewers: rwbarton, austin, simonmar
Subscribers: trommler, simonmar, trofi, jrtc27, thomie
GHC Trac Issues: #12759, #13702
Differential Revision: https://phabricator.haskell.org/D3589
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This copies the subset of Hoopl's functionality needed by GHC to
`cmm/Hoopl` and removes the dependency on the Hoopl package.
The main motivation for this change is the confusing/noisy interface
between GHC and Hoopl:
- Hoopl has `Label` which is GHC's `BlockId` but different than
GHC's `CLabel`
- Hoopl has `Unique` which is different than GHC's `Unique`
- Hoopl has `Unique{Map,Set}` which are different than GHC's
`Uniq{FM,Set}`
- GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
needed just to filter the exposed functions (filter out some of the
Hoopl's and add the GHC ones)
With this change, we'll be able to simplify this significantly.
It'll also be much easier to do invasive changes (Hoopl is a public
package on Hackage with users that depend on the current behavior)
This should introduce no changes in functionality - it merely
copies the relevant code.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: austin, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: simonpj, kavon, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3616
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While investigating #12545, I discovered several places in the code
that performed length-checks like so:
```
length ts == 4
```
This is not ideal, since the length of `ts` could be much longer than 4,
and we'd be doing way more work than necessary! There are already a slew
of helper functions in `Util` such as `lengthIs` that are designed to do
this efficiently, so I found every place where they ought to be used and
did just that. I also defined a couple more utility functions for list
length that were common patterns (e.g., `ltLength`).
Test Plan: ./validate
Reviewers: austin, hvr, goldfire, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: goldfire, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Phab:D3265 we introduced MO_F32_Fabs and MO_F64_Fabs.
This patch improves code generation by generating PowerPC fabs
instructions.
Test Plan: run numeric/should_run/numrun015 or validate
Reviewers: austin, bgamari, hvr, simonmar, erikd
Reviewed By: erikd
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3512
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch in in preparation for the fix to Trac #13397
The code generator has a special case for
case tagToEnum (a>#b) of
False -> e1
True -> e2
but it was not doing nearly so well on
case a>#b of
DEFAULT -> e1
1# -> e2
This patch arranges to behave essentially identically in
both cases. In due course we can eliminate the special
case for tagToEnum#, once we've completed Trac #13397.
The changes are:
* Make CmmSink swizzle the order of a conditional where necessary;
see Note [Improving conditionals] in CmmSink
* Hack the general case of StgCmmExpr.cgCase so that it use
NoGcInAlts for conditionals. This doesn't seem right, but it's
the same choice as the tagToEnum version. Without it, code size
increases a lot (more heap checks).
There's a loose end here.
* Add comments in CmmOpt.cmmMachOpFoldM
|
|
|
|
| |
Just a simple refactoring to remove duplication
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide PowerPC optimised implementations of callish prim ops.
MO_?_QuotRem
The generic implementation of quotient remainder prim ops uses
a division and a remainder operation. There is no remainder on
PowerPC and so we need to implement remainder "by hand" which
results in a duplication of the divide operation when using the
generic code.
Avoid this duplication by implementing the prim op in the native
code generator.
MO_U_Mul2
Use PowerPC's instructions for long multiplication.
Addition and subtraction
Use PowerPC add/subtract with carry/overflow instructions
MO_Clz and MO_Ctz
Use PowerPC's CNTLZ instruction and implement count trailing
zeros using count leading zeros
MO_QuotRem2
Implement an algorithm given by Henry Warren in "Hacker's Delight"
using PowerPC divide instruction. TODO: Use long division instructions
when available (POWER7 and later).
Test Plan: validate on AIX and 32-bit Linux
Reviewers: simonmar, erikd, hvr, austin, bgamari
Reviewed By: erikd, hvr, bgamari
Subscribers: trofi, kgardas, thomie
Differential Revision: https://phabricator.haskell.org/D2973
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before ghc-7.2 hs_add_root() had to be used to initialize haskell
modules when haskell was called from FFI.
commit a52ff7619e8b7d74a9d933d922eeea49f580bca8
("Change the way module initialisation is done (#3252, #4417)")
removed needs for hs_add_root() and made function a no-op.
For backward compatibility '__stginit_<module>' symbol was
not removed.
This change removes no-op hs_add_root() function and unused
'__stginit_<module>' symbol from each haskell module.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Test Plan: ./validate
Reviewers: simonmar, austin, bgamari, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3460
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This refactoring makes it more obvious when we are constructing
a Node for the digraph rather than a less useful 3-tuple.
Reviewers: austin, goldfire, bgamari, simonmar, dfeuer
Reviewed By: dfeuer
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3414
|
|
|
|
|
|
|
|
| |
Trac #13394, comment:4 showed up another place where we were testing
for the representation of of a type; and it turned out to be a JoinId
which can be rep-polymorphic.
Just putting the test in the right places solves this easily.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we have this in libraries/base/GHC/Float.hs:
```
abs x | x == 0 = 0 -- handles (-0.0)
| x > 0 = x
| otherwise = negateFloat x
```
But 3-4 years ago it was noted that this was inefficient:
https://mail.haskell.org/pipermail/libraries/2013-April/019690.html
We can generate better code for X86 and llvm and for others generate
some custom cmm code which is similar to what the compiler generates
now.
Reviewers: austin, simonmar, hvr, bgamari
Reviewed By: bgamari
Subscribers: dfeuer, thomie
Differential Revision: https://phabricator.haskell.org/D3265
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And use to mark `stg_stack_underflow_frame`, which we are unable to
determine a caller from.
To simplify parsing at the moment we steal the `return` keyword to
indicate an undefined unwind value. Perhaps this should be revisited.
Reviewers: scpmw, simonmar, austin, erikd
Subscribers: dfeuer, thomie
Differential Revision: https://phabricator.haskell.org/D2738
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack
unwinding information produced by GHC is currently quite approximate.
Essentially we assume that register values do not change at all within a
basic block. While this is somewhat true in normal Haskell code, blocks
containing foreign calls often break this assumption. This results in
unreliable call stacks, especially in the code containing foreign calls.
This is worse than it sounds as unreliable unwinding information can at
times result in segmentation faults.
This patch set attempts to improve this situation by tracking unwinding
information with finer granularity. By dispensing with the assumption of
one unwinding table per block, we allow the compiler to accurately
represent the areas surrounding foreign calls.
Towards this end we generalize the representation of unwind information
in the backend in three ways,
* Multiple CmmUnwind nodes can occur per block
* CmmUnwind nodes can now carry unwind information for multiple
registers (while not strictly necessary; this makes emitting
unwinding information a bit more convenient in the compiler)
* The NCG backend is given an opportunity to modify the unwinding
records since it may need to make adjustments due to, for instance,
native calling convention requirements for foreign calls (see
#11353).
This sets the stage for resolving #11337 and #11338.
Test Plan: Validate
Reviewers: scpmw, simonmar, austin, erikd
Subscribers: qnikst, thomie
Differential Revision: https://phabricator.haskell.org/D2741
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commits relaxes the invariants of the Core syntax so that a
top-level variable can be bound to a primitive string literal of type
Addr#.
This commit:
* Relaxes the invatiants of the Core, and allows top-level bindings whose
type is Addr# as long as their RHS is either a primitive string literal or
another variable.
* Allows the simplifier and the full-laziness transformer to float out
primitive string literals to the top leve.
* Introduces the new StgGenTopBinding type to accomodate top-level Addr#
bindings.
* Introduces a new type of labels in the object code, with the suffix "_bytes",
for exported top-level Addr# bindings.
* Makes some built-in rules more robust. This was necessary to keep them
functional after the above changes.
This is a continuation of D2554.
Rebasing notes:
This had two slightly suspicious performance regressions:
* T12425: bytes allocated regressed by roughly 5%
* T4029: bytes allocated regressed by a bit over 1%
* T13035: bytes allocated regressed by a bit over 5%
These deserve additional investigation.
Rebased by: bgamari.
Test Plan: ./validate --slow
Reviewers: goldfire, trofi, simonmar, simonpj, austin, hvr, bgamari
Reviewed By: trofi, simonpj, bgamari
Subscribers: trofi, simonpj, gridaphobe, thomie
Differential Revision: https://phabricator.haskell.org/D2605
GHC Trac Issues: #8472
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit implements the proposal in
https://github.com/ghc-proposals/ghc-proposals/pull/29 and
https://github.com/ghc-proposals/ghc-proposals/pull/35.
Here are some of the pieces of that proposal:
* Some of RuntimeRep's constructors have been shortened.
* TupleRep and SumRep are now parameterized over a list of RuntimeReps.
* This
means that two types with the same kind surely have the same
representation.
Previously, all unboxed tuples had the same kind, and thus the fact
above was
false.
* RepType.typePrimRep and friends now return a *list* of PrimReps. These
functions can now work successfully on unboxed tuples. This change is
necessary because we allow abstraction over unboxed tuple types and so
cannot
always handle unboxed tuples specially as we did before.
* We sometimes have to create an Id from a PrimRep. I thus split PtrRep
* into
LiftedRep and UnliftedRep, so that the created Ids have the right
strictness.
* The RepType.RepType type was removed, as it didn't seem to help with
* much.
* The RepType.repType function is also removed, in favor of typePrimRep.
* I have waffled a good deal on whether or not to keep VoidRep in
TyCon.PrimRep. In the end, I decided to keep it there. PrimRep is *not*
represented in RuntimeRep, and typePrimRep will never return a list
including
VoidRep. But it's handy to have in, e.g., ByteCodeGen and friends. I can
imagine another design choice where we have a PrimRepV type that is
PrimRep
with an extra constructor. That seemed to be a heavier design, though,
and I'm
not sure what the benefit would be.
* The last, unused vestiges of # (unliftedTypeKind) have been removed.
* There were several pretty-printing bugs that this change exposed;
* these are fixed.
* We previously checked for levity polymorphism in the types of binders.
* But we
also must exclude levity polymorphism in function arguments. This is
hard to check
for, requiring a good deal of care in the desugarer. See Note [Levity
polymorphism
checking] in DsMonad.
* In order to efficiently check for levity polymorphism in functions, it
* was necessary
to add a new bit of IdInfo. See Note [Levity info] in IdInfo.
* It is now safe for unlifted types to be unsaturated in Core. Core Lint
* is updated
accordingly.
* We can only know strictness after zonking, so several checks around
* strictness
in the type-checker (checkStrictBinds, the check for unlifted variables
under a ~
pattern) have been moved to the desugarer.
* Along the way, I improved the treatment of unlifted vs. banged
* bindings. See
Note [Strict binds checks] in DsBinds and #13075.
* Now that we print type-checked source, we must be careful to print
* ConLikes correctly.
This is facilitated by a new HsConLikeOut constructor to HsExpr.
Particularly troublesome
are unlifted pattern synonyms that get an extra void# argument.
* Includes a submodule update for haddock, getting rid of #.
* New testcases:
typecheck/should_fail/StrictBinds
typecheck/should_fail/T12973
typecheck/should_run/StrictPats
typecheck/should_run/T12809
typecheck/should_fail/T13105
patsyn/should_fail/UnliftedPSBind
typecheck/should_fail/LevPolyBounded
typecheck/should_compile/T12987
typecheck/should_compile/T11736
* Fixed tickets:
#12809
#12973
#11736
#13075
#12987
* This also adds a test case for #13105. This test case is
* "compile_fail" and
succeeds, because I want the testsuite to monitor the error message.
When #13105 is fixed, the test case will compile cleanly.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes some cases of wrong stacks being generated by the profiler.
For background and details on the fix see
`Note [Evaluating functions with profiling]` in `rts/Apply.cmm`.
This does have an impact on allocations for some programs when
profiling. nofib results:
```
k-nucleotide +0.0% +8.8% +11.0% +11.0% 0.0%
puzzle +0.0% +12.5% 0.244 0.246 0.0%
typecheck 0.0% +8.7% +16.1% +16.2% 0.0%
------------------------------------------------------------------------
--------
Min -0.0% -0.0% -34.4% -35.5% -25.0%
Max +0.0% +12.5% +48.9% +49.4% +10.6%
Geometric Mean +0.0% +0.6% +2.0% +1.8% -0.3%
```
But runtimes don't seem to be affected much, and the examples I looked
at were completely legitimate. For example, in puzzle we have this:
```
position :: ItemType -> StateType -> BankType
position Bono = bonoPos
position Edge = edgePos
position Larry = larryPos
position Adam = adamPos
```
where the identifiers on the rhs are all record selectors. Previously
the profiler gave a stack that looked like
```
position
bonoPos
...
```
i.e. `bonoPos` was at the same level of the call stack as `position`,
but now it looks like
```
position
bonoPos
...
```
I used the normaliser from the testsuite to diff the profiling output
from other nofib programs and they all looked better.
Test Plan:
* the broken test passes
* validate
* compiled and ran all of nofib, measured perf, diff'd several .prof
files
Reviewers: niteria, erikd, austin, scpmw, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2804
GHC Trac Issues: #5654, #10007
|