| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The splitter is an evil Perl script that processes assembler code.
Its job can be done better by the linker's --gc-sections flag. GHC
passes this flag to the linker whenever -split-sections is passed on
the command line.
This is based on @DemiMarie's D2768.
Fixes Trac #11315
Fixes Trac #9832
Fixes Trac #8964
Fixes Trac #8685
Fixes Trac #8629
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Smaller than word size integers must be promoted to word size
when passed on the stack. While on little endian systems we can
get away with writing a small integer to a word size stack slot
and read it as a word ignoring the upper bits, on big endian
systems a small integer write ends up in the most significant
bits and a word size read that ignores the upper bits delivers
a random value.
On little endian systems a smaller than word size write to
the stack might be more efficient but that decision is
system specific and should be done as an optimization in the
respective backends.
Fixes #16258
|
|
|
|
| |
Fixes #16222
|
|
|
|
| |
Also used ByteString in some other relevant places
|
| |
|
|
|
|
|
|
|
| |
Support for Mac OS X on PowerPC has been dropped by Apple years ago. We
follow suit and remove PowerPC support for Darwin.
Fixes #16106.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a fairly long-standing bug (dating back to 2015) in
RdrName.bestImport, namely
commit 9376249b6b78610db055a10d05f6592d6bbbea2f
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date: Wed Oct 28 17:16:55 2015 +0000
Fix unused-import stuff in a better way
In that patch got the sense of the comparison back to front, and
thereby failed to implement the unused-import rules described in
Note [Choosing the best import declaration] in RdrName
This led to Trac #13064 and #15393
Fixing this bug revealed a bunch of unused imports in libraries;
the ones in the GHC repo are part of this commit.
The two important changes are
* Fix the bug in bestImport
* Modified the rules by adding (a) in
Note [Choosing the best import declaration] in RdrName
Reason: the previosu rules made Trac #5211 go bad again. And
the new rule (a) makes sense to me.
In unravalling this I also ended up doing a few other things
* Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the
things that are used, rather than [AvailInfo]. This is simpler
and more direct.
* Rename greParentName to greParent_maybe, to follow GHC
naming conventions
* Delete dead code RdrName.greUsedRdrName
Bumps a few submodules.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5312
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When two 32-bit floats are adjacent for a 64-bit target, there is no
padding between them to force alignment, so we must combine their bit
representations into a single word.
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, carter
GHC Trac Issues: #15853
Differential Revision: https://phabricator.haskell.org/D5306
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current versions of Alex don't seem to produce as many warnings any
more.
In order to silence a warning and to avoid overlong lines, I've taken
the liberty of refactoring 'tok_num'.
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: erikd, rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5319
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a previous patch we replaced some built-in literal constructors
(MachInt, MachWord, etc.) with a single LitNumber constructor.
In this patch we replace the `Mach` prefix of the remaining constructors
with `Lit` for consistency (e.g., LitChar, LitLabel, etc.).
Sadly the name `LitString` was already taken for a kind of FastString
and it would become misleading to have both `LitStr` (literal
constructor renamed after `MachStr`) and `LitString` (FastString
variant). Hence this patch renames the FastString variant `PtrString`
(which is more accurate) and the literal string constructor now uses the
least surprising `LitString` name.
Both `Literal` and `LitString/PtrString` have recently seen breaking
changes so doing this kind of renaming now shouldn't harm much.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4881
|
|
|
|
|
|
|
|
|
|
|
|
| |
This builds off of D4475.
Bumps binary submodule.
Reviewers: carter, AndreasK, hvr, goldfire, bgamari, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D5006
|
|
|
|
| |
PR: https://github.com/ghc/ghc/pull/223/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The logic in `Note [recursive SRTs]` was correct. However, my
implementation of it wasn't: I got the associativity of
`Set.difference` wrong, which led to an extremely subtle and difficult
to find bug.
Fortunately now we have a test case. I was able to cut down the code
to something manageable, and I've added it to the test suite.
Test Plan:
Before (using my stage 1 compiler without the fix):
```
====> T15892(normal) 1 of 1 [0, 0, 0]
cd "T15892.run" && "/home/smarlow/ghc/inplace/bin/ghc-stage1" -o T15892
T15892.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts
-fno-warn-missed-specialisations -fshow-warning-groups
-fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat
-dno-debug-output -O
cd "T15892.run" && ./T15892 +RTS -G1 -A32k -RTS
Wrong exit code for T15892(normal)(expected 0 , actual 134 )
Stderr ( T15892 ):
T15892: internal error: evacuate: strange closure type 0
(GHC version 8.7.20181113 for x86_64_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Aborted (core dumped)
*** unexpected failure for T15892(normal)
=====> T15892(g1) 1 of 1 [0, 1, 0]
cd "T15892.run" && "/home/smarlow/ghc/inplace/bin/ghc-stage1" -o T15892
T15892.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts
-fno-warn-missed-specialisations -fshow-warning-groups
-fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat
-dno-debug-output -O
cd "T15892.run" && ./T15892 +RTS -G1 -RTS +RTS -G1 -A32k -RTS
Wrong exit code for T15892(g1)(expected 0 , actual 134 )
Stderr ( T15892 ):
T15892: internal error: evacuate: strange closure type 0
(GHC version 8.7.20181113 for x86_64_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Aborted (core dumped)
```
After (using my stage 2 compiler with the fix):
```
=====> T15892(normal) 1 of 1 [0, 0, 0]
cd "T15892.run" && "/home/smarlow/ghc/inplace/test spaces/ghc-stage2"
-o T15892 T15892.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts
-fno-warn-missed-specialisations -fshow-warning-groups
-fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat
-dno-debug-output
cd "T15892.run" && ./T15892 +RTS -G1 -A32k -RTS
=====> T15892(g1) 1 of 1 [0, 0, 0]
cd "T15892.run" && "/home/smarlow/ghc/inplace/test spaces/ghc-stage2"
-o T15892 T15892.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts
-fno-warn-missed-specialisations -fshow-warning-groups
-fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat
-dno-debug-output
cd "T15892.run" && ./T15892 +RTS -G1 -RTS +RTS -G1 -A32k -RTS
```
Reviewers: bgamari, osa1, erikd
Reviewed By: osa1
Subscribers: rwbarton, carter
GHC Trac Issues: #15892
Differential Revision: https://phabricator.haskell.org/D5334
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
primops.txt.pp gets two new sections for two new primitive types for
signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
two new constructors for them. All of the primops translate into the
existing MachOPs.
For CmmCalls the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then truncate
them back their original width.
x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
This is the second attempt at merging this, after the first attempt in
D4475 had to be backed out due to regressions on i386.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate (on both x86-{32,64})
Reviewers: bgamari, hvr, goldfire, simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5258
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to always generate direct access for cost centre labels. We
fixed this by generating indirect data load for cost centre defined in
external module.
Test Plan:
The added test used to fail with error message
```
/bin/ld.gold: error: T15723B.o: requires dynamic R_X86_64_PC32 reloc
against 'T15723A_foo1_EXPR_cc' which may overflow at runtime; recompile
with -fPIC
```
and now passes.
Also check that `R_X86_64_PC32` is generated for CostCentre from the
same module and `R_X86_64_GOTPCREL` is generated for CostCentre from
external module:
```
$ objdump -rdS T15723B.o
0000000000000028 <T15723B_test_info>:
28: 48 8d 45 f0 lea -0x10(%rbp),%rax
2c: 4c 39 f8 cmp %r15,%rax
2f: 72 70 jb a1 <T15723B_test_info+0x79>
31: 48 83 ec 08 sub $0x8,%rsp
35: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # 3c
<T15723B_test_info+0x14>
38: R_X86_64_PC32
T15723B_test1_EXPR_cc-0x4
3c: 49 8b bd 60 03 00 00 mov 0x360(%r13),%rdi
43: 31 c0 xor %eax,%eax
45: e8 00 00 00 00 callq 4a <T15723B_test_info+0x22>
46: R_X86_64_PLT32 pushCostCentre-0x4
4a: 48 83 c4 08 add $0x8,%rsp
4e: 48 ff 40 30 incq 0x30(%rax)
52: 49 89 85 60 03 00 00 mov %rax,0x360(%r13)
59: 48 83 ec 08 sub $0x8,%rsp
5d: 49 8b bd 60 03 00 00 mov 0x360(%r13),%rdi
64: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi # 6b
<T15723B_test_info+0x43>
67: R_X86_64_GOTPCREL T15723A_foo1_EXPR_cc-0x4
6b: 31 c0 xor %eax,%eax
6d: e8 00 00 00 00 callq 72 <T15723B_test_info+0x4a>
6e: R_X86_64_PLT32 pushCostCentre-0x4
72: 48 83 c4 08 add $0x8,%rsp
76: 48 ff 40 30 incq 0x30(%rax)
```
Reviewers: simonmar, bgamari
Reviewed By: simonmar
Subscribers: rwbarton, carter
GHC Trac Issues: #15723
Differential Revision: https://phabricator.haskell.org/D5214
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The behavior previously enabled by this flag is as been the default
since 8.6.1.
Reviewers: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5193
|
|
|
|
|
|
|
|
|
| |
This unfortunately broke i386 support since it introduced references to
byte-sized registers that don't exist on that architecture.
Reverts binary submodule
This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
- `primops.txt.pp` gets two new sections for two new primitive types
for signed and unsigned 8-bit integers (`Int8#` and `Word8`
respectively) along with basic arithmetic and comparison
operations. `PrimRep`/`RuntimeRep` get two new constructors for
them. All of the primops translate into the existing `MachOP`s.
- For `CmmCall`s the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then
truncate them back their original width.
- x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate with new tests
Reviewers: hvr, goldfire, bgamari, simonmar
Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 64c54fff2d6534e1229359a8d357ec1dc6c21b73
("Mark system and internal symbols as private symbols in asm")
Added `internalNamePrefix` helper. Unfortunately it
generates invalid label in unregisterised mode:
```
$ ./configure --enable-unregisterised
/tmp/ghc19372_0/ghc_4.hc:2831:22: error:
error: expected identifier or '(' before '.' token
static const StgWord .Lcl3_info[]__attribute__((aligned(8)))= {
^
```
Here asm-style prefix is applied to C symbol.
The fix is simple: apply asm-style labels only to assembly code.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewers: simonmar, last_g, bgamari
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5207
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shortcutting the SRT for a static function can lead to resurrecting a
static object at runtime, which violates assumptions in the GC. See
comments for details.
Test Plan:
- manual testing (in progress)
- validate
Reviewers: osa1, bgamari, erikd
Reviewed By: bgamari
Subscribers: rwbarton, carter
GHC Trac Issues: #15544
Differential Revision: https://phabricator.haskell.org/D5145
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This diff is a part of the bigger project which goal is to improve
common profiling tools support (perf) for GHC binaries.
A similar job was already done and reverted in the past:
* https://phabricator.haskell.org/rGHCb1f453e16f0ce11a2ab18cc4c350bdcbd36299a6
* https://phabricator.haskell.org/rGHCf1f3c4f50650110ad0f700d6566a44c515b0548f
Reasoning:
`Perf` and similar tools build in memory symbol table from the .symtab
section of the ELF file to display human-readable function names instead
of the addresses in the output. `Perf` uses only two types of symbols:
`@function` and `@notype` but GHC is not capable to produce any
`@function` symbols so the `perf` output is pretty useless (All the
haskell symbols that you can see in `perf` now are `@notype` internal
symbols extracted by mistake/hack).
The changes:
* mark code related symbols as @function
* small hack to mark InfoTable symbols as code if TABLES_NEXT_TO_CODE is true
Limitations:
* The perf symbolization support is not complete after this patch but
I'm working on the second patch.
* Constructor symbols are not supported. To fix that we can issue extra
local symbols which mark code sections as code and will be only used
for debug.
Test Plan:
tests
any additional ideas?
Perf output on stock ghc 8.4.1:
```
9.78% FibbSlow FibbSlow [.] ckY_info
9.59% FibbSlow FibbSlow [.] cjqd_info
7.17% FibbSlow FibbSlow [.] c3sg_info
6.62% FibbSlow FibbSlow [.] c1X_info
5.32% FibbSlow FibbSlow [.] cjsX_info
4.18% FibbSlow FibbSlow [.] s3rN_info
3.82% FibbSlow FibbSlow [.] c2m_info
3.68% FibbSlow FibbSlow [.] cjlJ_info
3.26% FibbSlow FibbSlow [.] c3sb_info
3.19% FibbSlow FibbSlow [.] cjPQ_info
3.05% FibbSlow FibbSlow [.] cjQd_info
2.97% FibbSlow FibbSlow [.] cjAB_info
2.78% FibbSlow FibbSlow [.] cjzP_info
2.40% FibbSlow FibbSlow [.] cjOS_info
2.38% FibbSlow FibbSlow [.] s3rK_info
2.27% FibbSlow FibbSlow [.] cjq0_info
2.18% FibbSlow FibbSlow [.] cKQ_info
2.13% FibbSlow FibbSlow [.] cjSl_info
1.99% FibbSlow FibbSlow [.] s3rL_info
1.98% FibbSlow FibbSlow [.] c2cC_info
1.80% FibbSlow FibbSlow [.] s3rO_info
1.37% FibbSlow FibbSlow [.] c2f2_info
...
```
Perf output on patched ghc:
```
7.97% FibbSlow FibbSlow [.] c3rM_info
6.75% FibbSlow FibbSlow [.] 0x000000000032cfa8
6.63% FibbSlow FibbSlow [.] cifA_info
4.98% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info
4.55% FibbSlow FibbSlow [.] chXn_info
4.52% FibbSlow FibbSlow [.] c3rH_info
4.45% FibbSlow FibbSlow [.] chZB_info
4.04% FibbSlow FibbSlow [.] Main_fibbzuslow_info
4.03% FibbSlow FibbSlow [.] stg_ap_0_fast
3.76% FibbSlow FibbSlow [.] chXA_info
3.67% FibbSlow FibbSlow [.] cifu_info
3.25% FibbSlow FibbSlow [.] ci4r_info
2.64% FibbSlow FibbSlow [.] s3rf_info
2.42% FibbSlow FibbSlow [.] s3rg_info
2.39% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqInteger_info
2.25% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_minusInteger_info
2.17% FibbSlow FibbSlow [.] ghczmprim_GHCziClasses_zeze_info
2.09% FibbSlow FibbSlow [.] cicc_info
2.03% FibbSlow FibbSlow [.] 0x0000000000331e15
2.02% FibbSlow FibbSlow [.] s3ri_info
1.91% FibbSlow FibbSlow [.] 0x0000000000331bb8
1.89% FibbSlow FibbSlow [.] ci4N_info
...
```
Reviewers: simonmar, niteria, bgamari, goldfire
Reviewed By: simonmar, bgamari
Subscribers: lelf, rwbarton, thomie, carter
GHC Trac Issues: #15501
Differential Revision: https://phabricator.haskell.org/D4713
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This marks system and internal symbols as private in asm output so those
random generated sysmbols won't appear in .symtab
Reasoning:
* internal symbols don't help to debug because names are just random
* the symbols style breaks perf logic
* internal symbols can take ~75% of the .symtab. In the same time
.symtab can take about 20% of the binary file size
Notice:
This diff mostly makes sense on top of the D4713 (or similar)
Test Plan:
tests
Perf from D4713
```
7.97% FibbSlow FibbSlow [.] c3rM_info
6.75% FibbSlow FibbSlow [.] 0x000000000032cfa8
6.63% FibbSlow FibbSlow [.] cifA_info
4.98% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info
4.55% FibbSlow FibbSlow [.] chXn_info
4.52% FibbSlow FibbSlow [.] c3rH_info
4.45% FibbSlow FibbSlow [.] chZB_info
4.04% FibbSlow FibbSlow [.] Main_fibbzuslow_info
4.03% FibbSlow FibbSlow [.] stg_ap_0_fast
3.76% FibbSlow FibbSlow [.] chXA_info
3.67% FibbSlow FibbSlow [.] cifu_info
3.25% FibbSlow FibbSlow [.] ci4r_info
2.64% FibbSlow FibbSlow [.] s3rf_info
2.42% FibbSlow FibbSlow [.] s3rg_info
2.39% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqInteger_info
2.25% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_minusInteger_info
2.17% FibbSlow FibbSlow [.] ghczmprim_GHCziClasses_zeze_info
2.09% FibbSlow FibbSlow [.] cicc_info
2.03% FibbSlow FibbSlow [.] 0x0000000000331e15
2.02% FibbSlow FibbSlow [.] s3ri_info
1.91% FibbSlow FibbSlow [.] 0x0000000000331bb8
1.89% FibbSlow FibbSlow [.] ci4N_info
...
```
Perf from this patch:
```
15.37% FibbSlow FibbSlow [.] Main_fibbzuslow_info
15.33% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_minusInteger_info
13.34% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info
9.24% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_plusInteger_info
9.08% FibbSlow FibbSlow [.] frame_dummy
8.25% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqInteger_info
4.29% FibbSlow FibbSlow [.] 0x0000000000321ab0
3.84% FibbSlow FibbSlow [.] stg_ap_0_fast
3.07% FibbSlow FibbSlow [.] ghczmprim_GHCziClasses_zeze_info
2.39% FibbSlow FibbSlow [.] 0x0000000000321ab7
1.90% FibbSlow FibbSlow [.] 0x00000000003266b8
1.88% FibbSlow FibbSlow [.] base_GHCziNum_zm_info
1.83% FibbSlow FibbSlow [.] 0x0000000000326915
1.34% FibbSlow FibbSlow [.] 0x00000000003248cc
1.07% FibbSlow FibbSlow [.] base_GHCziNum_zp_info
0.98% FibbSlow FibbSlow [.] 0x00000000003247c8
0.80% FibbSlow FibbSlow [.] 0x0000000000121498
0.79% FibbSlow FibbSlow [.] stg_gc_noregs
0.75% FibbSlow FibbSlow [.] 0x0000000000321ad6
0.67% FibbSlow FibbSlow [.] 0x0000000000321aca
0.64% FibbSlow FibbSlow [.] 0x0000000000321b4a
0.61% FibbSlow FibbSlow [.] 0x00000000002ff633
```
Reviewers: simonmar, niteria, bgamari
Reviewed By: simonmar
Subscribers: lelf, angerman, olsner, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4722
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: hvr, bgamari, simonmar, jrtc27
Reviewed By: bgamari
Subscribers: alpmestan, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D5034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds foldl' to GhcPrelude and changes must occurences
of foldl to foldl'. This leads to better performance especially
for quick builds where GHC does not perform strictness analysis.
It does change strictness behaviour when we use foldl' to turn
a argument list into function applications. But this is only a
drawback if code looks ONLY at the last argument but not at the first.
And as the benchmarks show leads to fewer allocations in practice
at O2.
Compiler performance for Nofib:
O2 Allocations:
-1 s.d. ----- -0.0%
+1 s.d. ----- -0.0%
Average ----- -0.0%
O2 Compile Time:
-1 s.d. ----- -2.8%
+1 s.d. ----- +1.3%
Average ----- -0.8%
O0 Allocations:
-1 s.d. ----- -0.2%
+1 s.d. ----- -0.1%
Average ----- -0.2%
Test Plan: ci
Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal
Reviewed By: bgamari, monoidal
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4929
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4957
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems like we currently support string literals in Cmm, so we can use
__LINE__ CPP macro in assertion macros. This improves error messages
that previously looked like
ASSERTION FAILED: file (null), line 1302
(null) part now shows the actual file name.
Also inline some single-use string literals in PrimOps.cmm.
Reviewers: bgamari, simonmar, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4862
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gabor noticed C warning when building unregisterised
64-bit compiler on GHC.Integer.Types (from integer-simple).
Minimised example with a warning:
```haskell
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# OPTIONS_GHC -Wall #-}
module M (bug) where
import GHC.Prim (Word#, minusWord#, ltWord#)
import GHC.Types (isTrue#)
-- assume Word = Word64
bug :: Word# -> Word#
bug x = if isTrue# (x `ltWord#` 0x8000000000000000##) then 0##
else x `minusWord#` 0x8000000000000000##
```
```
$ LANG=C x86_64-UNREG-linux-gnu-ghc -O1 -c M.hs -fforce-recomp
/tmp/ghc30219_0/ghc_1.hc: In function 'M_bug_entry':
/tmp/ghc30219_0/ghc_1.hc:20:14: error:
warning: integer constant is so large that it is unsigned
```
It's caused by limited handling of integer literals in CmmRegOff.
This change switches to use standard integer literal pretty-printer.
C code before the change:
```c
FN_(M_bug_entry) {
W_ _sAg;
_cAr:
_sAg = *Sp;
switch ((W_)(_sAg < 0x8000000000000000UL)) {
case 0x1UL: goto _cAq;
default: goto _cAp;
}
_cAp:
R1.w = _sAg+-9223372036854775808;
// ...
```
C code after the change:
```c
FN_(M_bug_entry) {
W_ _sAg;
_cAr:
_sAg = *Sp;
switch ((W_)(_sAg < 0x8000000000000000UL)) {
case 0x1UL: goto _cAq;
default: goto _cAp;
}
_cAp:
R1.w = _sAg+(-0x8000000000000000UL);
```
URL: https://mail.haskell.org/pipermail/ghc-devs/2018-June/015875.html
Reported-by: Gabor Greif
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Test Plan: test generated code on unregisterised mips64 and amd64
Reviewers: simonmar, ggreif, bgamari
Reviewed By: ggreif, bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4856
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix UNREG build failure for 32-bit targets.
This change is an equivalent of commit
0238a6c78102d43dae2f56192bd3486e4f9ecf1d
("UNREG: PprC: add support for of W32 literals")
The change allows combining two subwords into one word
on 32-bit targets. Tested on nios2-unknown-linux-gnu.
GHC Trac Issues: #15237
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today UNREG build fails to generate sub-word literals:
```
rts_dist_HC rts/dist/build/StgStartup.o
ghc-stage1: panic! (the 'impossible' happened)
(GHC version 8.5.20180612 for x86_64-unknown-linux):
pprStatics: cannot emit a non-word-sized static literal
W32
```
The change allows combining two subwords into one word.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewers: simonmar, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #15237
Differential Revision: https://phabricator.haskell.org/D4837
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This for some reason or the other and makes it into the final
binary. I've added the check to ContFlowOpt as that seems
like a logical place for this.
In a regular nofib run there were 30 occurences of this pattern.
Test Plan: ci
Reviewers: bgamari, simonmar, dfeuer, jrtc27, tdammers
Reviewed By: bgamari, simonmar
Subscribers: tdammers, dfeuer, rwbarton, thomie, carter
GHC Trac Issues: #15188
Differential Revision: https://phabricator.haskell.org/D4740
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY
MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN
Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
MUT_ARR_PTRS).
(alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
Removed a few comments in Scav.c about FROZEN0 being on the mut_list
because it's now clear from the closure type.
Reviewers: bgamari, simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4784
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allows easier structural comparison of Cmm code.
Before:
```
cxCH: // global
_suEU::P64 = R1;
if ((Sp + -16) < SpLim) (likely: False) goto cxCI; else goto
cxCJ;
```
After
```
_lbl_: // global
__locVar_::P64 = R1;
if ((Sp + -16) < SpLim) (likely: False) goto cxBf; else goto
cxBg;
```
Test Plan: Looking at dumps, ci
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4786
|
|
|
|
| |
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Use toBlockList instead of revPostorder.
Block elimination works on a given Cmm graph by:
* Getting a list of blocks.
* Looking for duplicates in these blocks.
* Removing all but one instance of duplicates.
There are two (reasonable) ways to get the list of blocks.
* The fast way: `toBlockList`
This just flattens the underlying map into a list.
* The convenient way: `revPostorder`
Start at the entry label, scan for reachable blocks and return
only these. This has the advantage of removing all dead code.
If there is dead code the later is better. Work done on unreachable
blocks is clearly wasted work. However by the point we run the
common block elimination pass the input graph already had all dead code
removed. This is done during control flow optimization in
CmmContFlowOpt which is our first Cmm pass.
This means common block elimination is free to use toBlockList
because revPostorder would return the same blocks. (Although in
a different order).
* Change the triemap used for grouping by a label list
from `(TM.ListMap UniqDFM)` to `ListMap (GenMap LabelMap)`.
* Using GenMap offers leaf compression. Which is a trie
optimization described by the Note [Compressed TrieMap] in
CoreSyn/TrieMap.hs
* Using LabelMap removes the overhead associated with UniqDFM.
This is deterministic since if we have the same input keys the same
LabelMap will be constructed.
Test Plan: ci, profiling output
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: dfeuer, thomie, carter
GHC Trac Issues: #15103
Differential Revision: https://phabricator.haskell.org/D4597
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Noticed section mismatch on UNREG build failure:
```
HC [stage 1] libraries/integer-gmp/dist-install/build/GHC/Integer/Type.o
error: conflicting types for 'ufu0_srt'
static StgWord ufu0_srt[]__attribute__((aligned(8)))= {
^~~~~~~~
note: previous declaration of 'ufu0_srt' was here
IRO_(ufu0_srt);
^~~~~~~~
```
`IRO_` is a 'const' qualifier.
The error is a leftover from commit 838b69032566ce6ab3918d70e8d5e098d0bcee02
"Merge FUN_STATIC closure with its SRT" where part of SRT was moved
into closure itself and made SRTs writable.
This change puts all SRTs into writable section.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewers: simonmar, bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4731
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, this optimisation is infeasible on MachO platforms (e.g.
Darwin) due to an object format limitation. Specifically, linking fails
with errors of the form:
error: unsupported relocation with subtraction expression, symbol
'_integerzmgmp_GHCziIntegerziType_quotInteger_closure' can not be
undefined in a subtraction expression
Apparently MachO does not permit relocations' subtraction expressions to
refer to undefined symbols. As far as I can tell this means that it is
essentially impossible to express an offset between symbols living in
different compilation units. This means that we lively can't use this
optimisation on MachO platforms.
Test Plan: Validate on Darwin
Reviewers: simonmar, erikd
Subscribers: rwbarton, thomie, carter, angerman
GHC Trac Issues: #15169
Differential Revision: https://phabricator.haskell.org/D4715
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I had good intentions, but they were not being followed. In particular,
this comment:
```
--- - we never resolve a reference to a CAF to the contents of its SRT, since
--- the point of SRTs is to keep CAFs alive.
```
was not true, because we updated the srtMap after generating the SRT
for a CAF. Therefore it was possible for another CAF to refer to an
earlier CAF, and the reference to the earlier CAF would be shortcutted
to refer to its SRT instead of pointing to the CAF itself.
The fix is just to not update the srtMap when generating the SRT for a
CAF, but I also refactored the code and comments around this to be a bit
better organised.
Test Plan: Harbourmaster
Reviewers: bgamari, michalt, simonpj, erikd
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #15173, #15168
Differential Revision: https://phabricator.haskell.org/D4721
|
| |
|
|
|
|
| |
Addressing review comments on D4637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The idea here is to save a little code size and some work in the GC,
by collapsing FUN_STATIC closures and their SRTs.
This is (4) in a series; see D4632 for more details.
There's a tradeoff here: more complexity in the compiler in exchange
for a modest code size reduction (probably around 0.5%).
Results:
* GHC binary itself (statically linked) is 1% smaller
* -0.2% binary sizes in nofib (-0.5% module sizes)
Full nofib results comparing D4634 with this: P177 (ignore runtimes,
these aren't stable on my laptop)
Test Plan: validate, nofib
Reviewers: bgamari, niteria, simonpj, erikd
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An info table with an SRT normally looks like this:
StgWord64 srt_offset
StgClosureInfo layout
StgWord32 layout
StgWord32 has_srt
But we only need 32 bits for srt_offset on x86_64, because the small
memory model requires that code segments are at most 2GB. So we can
optimise this to
StgClosureInfo layout
StgWord32 layout
StgWord32 srt_offset
saving a word. We can tell whether the info table has an SRT or not,
because zero is not a valid srt_offset, so zero still indicates that
there's no SRT.
Test Plan:
* validate
* For results, see D4632.
Reviewers: bgamari, niteria, osa1, erikd
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4634
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change makes it possible to generate a static 32-bit relative label
offset on x86_64. Currently we can only generate word-sized label
offsets.
This will be used in D4634 to shrink info tables. See D4632 for more
details.
Test Plan: See D4632
Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4633
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Previously we would hvae a single big table of pointers per module,
with a set of bitmaps to reference entries within it. The new
representation is identical to a static constructor, which is much
simpler for the GC to traverse, and we get to remove the complicated
bitmap-traversal code from the GC.
- Rewrite all the code to generate SRTs in CmmBuildInfoTables, and
document it much better (see Note [SRTs]). This has been something
I've wanted to do since we moved to the new code generator, I
finally had the opportunity to finish it while on a transatlantic
flight recently :)
There are a series of 4 diffs:
1. D4632 (this one), which does the bulk of the changes
2. D4633 which adds support for smaller `CmmLabelDiffOff` constants
3. D4634 which takes advantage of D4632 and D4633 to save a word in
info tables that have an SRT on x86_64. This is where most of the
binary size improvement comes from.
4. D4637 which makes a further optimisation to merge some SRTs with
static FUN closures. This adds some complexity and the benefits
are fairly modest, so it's not clear yet whether we should do this.
Results (after (3), on x86_64)
- GHC itself (staticaly linked) is 5.2% smaller
- -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
- I measured the overhead of traversing all the static objects in a
major GC in GHC itself by doing `replicateM_ 1000 performGC` as the
first thing in `Main.main`. The new version was 5-10% faster, but
the results did vary quite a bit.
- I'm not sure if there's a compile-time difference, the results are
too unreliable.
Test Plan: validate
Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4632
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'.
I found 'plusWord2#' while implementing this, which both lacks
documentation and has a slightly different specification than
'addWordC#', which means the generic implementation is unnecessarily
complex.
While I was at it, I also added lacking meta-information on PrimOps
and refactored 'subWordC#'s generic implementation to be branchless.
Reviewers: bgamari, simonmar, jrtc27, dfeuer
Reviewed By: bgamari, dfeuer
Subscribers: dfeuer, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4592
|