| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #16052
When the offset in `setByteArray#` is statically known, we can provide
better alignment guarantees then just 1 byte.
Also, memset can now do 64-bit wide sets.
The current memset intrinsic is not optimal however and can be
improved for the case when we know that we deal with
(baseAddress at known alignment) + offset
For instance, on 64-bit
`setByteArray# s 1# 23# 0#`
given that bytearray is 8 bytes aligned could be unrolled into
`movb, movw, movl, movq, movq`; but currently it is
`movb x23` since alignment of 1 is all we can embed into MO_Memset op.
|
|
|
|
|
|
| |
This commit includes the necessary changes in code and
documentation to support a primop that reverses a word's
bits. It also includes a test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves all URL references to Trac Wiki to their corresponding
GitLab counterparts.
This substitution is classified as follows:
1. Automated substitution using sed with Ben's mapping rule [1]
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
2. Manual substitution for URLs containing `#` index
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
3. Manual substitution for strings starting with `Commentary`
Old: Commentary/XxxYyy...
New: commentary/xxx-yyy...
See also !539
[1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We make liveness information for global registers
available on `JMP` and `BCTR`, which were the last instructions
missing. With complete liveness information we do not need to
reserve global registers in `freeReg` anymore. Moreover we
assign R9 and R10 to callee saves registers.
Cleanup by removing `Reg_Su`, which was unused, from `freeReg`
and removing unused register definitions.
The calculation of the number of floating point registers is too
conservative. Just follow X86 and specify the constants directly.
Overall on PowerPC this results in 0.3 % smaller code size in nofib
while runtime is slightly better in some tests.
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
| |
GHC native code generator generates .incbin and .file directives. We
need to escape those strings correctly on Windows (see #16389).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The splitter is an evil Perl script that processes assembler code.
Its job can be done better by the linker's --gc-sections flag. GHC
passes this flag to the linker whenever -split-sections is passed on
the command line.
This is based on @DemiMarie's D2768.
Fixes Trac #11315
Fixes Trac #9832
Fixes Trac #8964
Fixes Trac #8685
Fixes Trac #8629
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It never really encoded a invariant.
* The linear register allocator just did partial pattern matches
* The graph allocator just set it to (Just mapEmpty) for Nothing
So I changed LiveInfo to directly contain the map.
Further natCmmTopToLive which filled in Nothing is no longer exported.
Instead we know call cmmTopLiveness which changes the type AND fills
in the map.
|
|
|
|
|
|
|
|
|
|
| |
This patch adds an optimization into the NCG: for large strings
(threshold configurable via -fbinary-blob-threshold=NNN flag), instead
of printing `.asciz "..."` in the generated ASM source, we print
`.incbin "tmpXXX.dat"` and we dump the contents of the string into a
temporary "tmpXXX.dat" file.
See the note for more details.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The graph allocator now dynamically resizes the number of stack
slots when running into the limit.
This fixes #8657.
Also loop membership of basic blocks is now available
in the register allocator for cost heuristics.
|
| |
|
|
|
|
| |
Fixes #16222
|
|
|
|
|
|
|
|
|
|
| |
* Remove `takeL/R 1` occurences by lastOL/headOL.
* Make BlockChain a OrdList newtype by removing the set of blocks.
Initially BlockChain contained both, a set for membership test
and a ordered list of blocks. The set is not used for any
performance sensitive lookups so we get rid of it.
|
|
|
|
|
| |
OrdList does the same thing and more so there is no reason
to have both.
|
|
|
|
|
|
|
|
|
| |
* Use `ByteString.foldr` instead of `(List.foldr . BS.unpack)`
* Avoid calling `chr` and its test that checks for invalid Unicode
codepoints: we stay in the ASCII range so we know we're ok
* Avoid calling `isPrint` (unsafe FFI call): we can check the ASCII
printable range directly
* Use bit operations (`unsafeShiftR`, `.&.`) instead of `div` and `mod`
|
|
|
|
| |
Also used ByteString in some other relevant places
|
|
|
|
|
|
|
| |
under -mbmi2
This works similarly to existing implementation for popCount.
Trac ticket: #16086.
|
|
|
|
| |
This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
|
| |
|
|
|
|
|
| |
Add missing color mappings to regDotColor for amd64.
Also set fakeRegs to red instead of xmm regs.
|
| |
|
|
|
|
|
| |
Rename constructors in calling convention data type to reflect the
fact that they represent an ELF ABI not only a Linux ABI.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
All operating systems except AIX and Darwin follow the ELF
specification.
|
|
|
|
|
| |
There is only one place where UPDATE_SP was used. Instead of the
UPDATE_SP pseudo instruction build the list of instructions directly.
|
| |
|
|
|
|
|
|
|
| |
Support for Mac OS X on PowerPC has been dropped by Apple years ago. We
follow suit and remove PowerPC support for Darwin.
Fixes #16106.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handle Int*QuotRemOP and Word*QuotRemOp in PPC NCG.
Refactor common code with remainder operation.
Test Plan: validate (I validated on Linux powerpc64le and x86_64)
Reviewers: erikd, hvr, bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5323
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generate code for MachOps with smaller than wordsize data.
Refactor conversion MachOps.
Fixes #15854
Test Plan: validate (I validated on powerpc64le and x86_64 Linux)
Reviewers: bgamari, hvr, erikd, simonmar
Subscribers: rwbarton, carter
GHC Trac Issues: #15854
Differential Revision: https://phabricator.haskell.org/D5300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
My original goal was (Trac #15809) to move towards using level numbers
as the basis for deciding which type variables to generalise, rather
than searching for the free varaibles of the environment. However
it has turned into a truly major refactoring of the kind inference
engine.
Let's deal with the level-numbers part first:
* Augment quantifyTyVars to calculate the type variables to
quantify using level numbers, and compare the result with
the existing approach. That is; no change in behaviour,
just a WARNing if the two approaches give different answers.
* To do this I had to get the level number right when calling
quantifyTyVars, and this entailed a bit of care, especially
in the code for kind-checking type declarations.
* However, on the way I was able to eliminate or simplify
a number of calls to solveEqualities.
This work is incomplete: I'm not /using/ level numbers yet.
When I subsequently get rid of any remaining WARNings in
quantifyTyVars, that the level-number answers differ from
the current answers, then I can rip out the current
"free vars of the environment" stuff.
Anyway, this led me into deep dive into kind inference for type and
class declarations, which is an increasingly soggy part of GHC.
Richard already did some good work recently in
commit 5e45ad10ffca1ad175b10f6ef3327e1ed8ba25f3
Date: Thu Sep 13 09:56:02 2018 +0200
Finish fix for #14880.
The real change that fixes the ticket is described in
Note [Naughty quantification candidates] in TcMType.
but I kept turning over stones. So this patch has ended up
with a pretty significant refactoring of that code too.
Kind inference for types and classes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Major refactoring in the way we generalise the inferred kind of
a TyCon, in kcTyClGroup. Indeed, I made it into a new top-level
function, generaliseTcTyCon. Plus a new Note to explain it
Note [Inferring kinds for type declarations].
* We decided (Trac #15592) not to treat class type variables specially
when dealing with Inferred/Specified/Required for associated types.
That simplifies things quite a bit. I also rewrote
Note [Required, Specified, and Inferred for types]
* Major refactoring of the crucial function kcLHsQTyVars:
I split it into
kcLHsQTyVars_Cusk and kcLHsQTyVars_NonCusk
because the two are really quite different. The CUSK case is
almost entirely rewritten, and is much easier because of our new
decision not to treat the class variables specially
* I moved all the error checks from tcTyClTyVars (which was a bizarre
place for it) into generaliseTcTyCon and/or the CUSK case of
kcLHsQTyVars. Now tcTyClTyVars is extremely simple.
* I got rid of all the all the subtleties in tcImplicitTKBndrs. Indeed
now there is no difference between tcImplicitTKBndrs and
kcImplicitTKBndrs; there is now a single bindImplicitTKBndrs.
Same for kc/tcExplicitTKBndrs. None of them monkey with level
numbers, nor build implication constraints. scopeTyVars is gone
entirely, as is kcLHsQTyVarBndrs. It's vastly simpler.
I found I could get rid of kcLHsQTyVarBndrs entirely, in favour of
the bnew bindExplicitTKBndrs.
Quantification
~~~~~~~~~~~~~~
* I now deal with the "naughty quantification candidates"
of the previous patch in candidateQTyVars, rather than in
quantifyTyVars; see Note [Naughty quantification candidates]
in TcMType.
I also killed off closeOverKindsCQTvs in favour of the same
strategy that we use for tyCoVarsOfType: namely, close over kinds
at the occurrences.
And candidateQTyVars no longer needs a gbl_tvs argument.
* Passing the ContextKind, rather than the expected kind itself,
to tc_hs_sig_type_and_gen makes it easy to allocate the expected
result kind (when we are in inference mode) at the right level.
Type families
~~~~~~~~~~~~~~
* I did a major rewrite of the impenetrable tcFamTyPats. The result
is vastly more comprehensible.
* I got rid of kcDataDefn entirely, quite a big function.
* I re-did the way that checkConsistentFamInst works, so
that it allows alpha-renaming of invisible arguments.
* The interaction of kind signatures and family instances is tricky.
Type families: see Note [Apparently-nullary families]
Data families: see Note [Result kind signature for a data family instance]
and Note [Eta-reduction for data families]
* The consistent instantation of an associated type family is tricky.
See Note [Checking consistent instantiation] and
Note [Matching in the consistent-instantation check]
in TcTyClsDecls. It's now checked in TcTyClsDecls because that is
when we have the relevant info to hand.
* I got tired of the compromises in etaExpandFamInst, so I did the
job properly by adding a field cab_eta_tvs to CoAxBranch.
See Coercion.etaExpandCoAxBranch.
tcInferApps and friends
~~~~~~~~~~~~~~~~~~~~~~~
* I got rid of the mysterious and horrible ClsInstInfo argument
to tcInferApps, checkExpectedKindX, and various checkValid
functions. It was horrible!
* I got rid of [Type] result of tcInferApps. This list was used
only in tcFamTyPats, when checking the LHS of a type instance;
and if there is a cast in the middle, the list is meaningless.
So I made tcInferApps simpler, and moved the complexity
(not much) to tcInferApps.
Result: tcInferApps is now pretty comprehensible again.
* I refactored the many function in TcMType that instantiate skolems.
Smaller things
* I rejigged the error message in checkValidTelescope; I think it's
quite a bit better now.
* checkValidType was not rejecting constraints in a kind signature
forall (a :: Eq b => blah). blah2
That led to further errors when we then do an ambiguity check.
So I make checkValidType reject it more aggressively.
* I killed off quantifyConDecl, instead calling kindGeneralize
directly.
* I fixed an outright bug in tyCoVarsOfImplic, where we were not
colleting the tyvar of the kind of the skolems
* Renamed ClsInstInfo to AssocInstInfo, and made it into its
own data type
* Some fiddling around with pretty-printing of family
instances which was trickier than I thought. I wanted
wildcards to print as plain "_" in user messages, although
they each need a unique identity in the CoAxBranch.
Some other oddments
* Refactoring around the trace messages from reportUnsolved.
* A bit of extra tc-tracing in TcHsSyn.commitFlexi
This patch fixes a raft of bugs, and includes tests for them.
* #14887
* #15740
* #15764
* #15789
* #15804
* #15817
* #15870
* #15874
* #15881
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a fairly long-standing bug (dating back to 2015) in
RdrName.bestImport, namely
commit 9376249b6b78610db055a10d05f6592d6bbbea2f
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date: Wed Oct 28 17:16:55 2015 +0000
Fix unused-import stuff in a better way
In that patch got the sense of the comparison back to front, and
thereby failed to implement the unused-import rules described in
Note [Choosing the best import declaration] in RdrName
This led to Trac #13064 and #15393
Fixing this bug revealed a bunch of unused imports in libraries;
the ones in the GHC repo are part of this commit.
The two important changes are
* Fix the bug in bestImport
* Modified the rules by adding (a) in
Note [Choosing the best import declaration] in RdrName
Reason: the previosu rules made Trac #5211 go bad again. And
the new rule (a) makes sense to me.
In unravalling this I also ended up doing a few other things
* Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the
things that are used, rather than [AvailInfo]. This is simpler
and more direct.
* Rename greParentName to greParent_maybe, to follow GHC
naming conventions
* Delete dead code RdrName.greUsedRdrName
Bumps a few submodules.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5312
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When splitting objects we sometimes generate
dummy CmmProcs containing bottom in some fields.
Code introduced in the new code layout patch looked
at these which blew up the compiler. Now we instead
check first if the function actually contains code.
Reviewers: bgamari
Subscribers: simonpj, rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5357
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a previous patch we replaced some built-in literal constructors
(MachInt, MachWord, etc.) with a single LitNumber constructor.
In this patch we replace the `Mach` prefix of the remaining constructors
with `Lit` for consistency (e.g., LitChar, LitLabel, etc.).
Sadly the name `LitString` was already taken for a kind of FastString
and it would become misleading to have both `LitStr` (literal
constructor renamed after `MachStr`) and `LitString` (FastString
variant). Hence this patch renames the FastString variant `PtrString`
(which is more accurate) and the literal string constructor now uses the
least surprising `LitString` name.
Both `Literal` and `LitString/PtrString` have recently seen breaking
changes so doing this kind of renaming now shouldn't harm much.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4881
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
primops.txt.pp gets two new sections for two new primitive types for
signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
two new constructors for them. All of the primops translate into the
existing MachOPs.
For CmmCalls the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then truncate
them back their original width.
x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
This is the second attempt at merging this, after the first attempt in
D4475 had to be backed out due to regressions on i386.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate (on both x86-{32,64})
Reviewers: bgamari, hvr, goldfire, simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5258
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Encountered assembly error due to undefined label `.LcaDcU_info_end` for
following code generated by `pprFrameProc`:
```
.Lsat_sa8fp{v}_info_fde_end:
.long .Lblock{v caDcU}_info_fde_end-.Lblock{v caDcU}_info_fde
.Lblock{v caDcU}_info_fde:
.long _nbHlD-.Lsection_frame
.quad block{v caDcU}_info-1
.quad .Lblock{v caDcU}_info_end-block{v caDcU}_info+1
.byte 1
```
This diff fixed the error.
Test Plan:
./validate
Also the case where we used to have assembly error is now fixed.
Unfortunately, I have limited insight here and cannot get a small enough repro
or test case for this.
Ben says:
> I think I see: Previously we only produced end symbols for the info
> tables of top-level procedures. However, blocks within a procedure may
> also have info tables, we will dutifully generate debug information for
> and consequently we get undefined symbols.
Reviewers: simonmar, scpmw, last_g, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5246
|
|
|
|
|
|
|
|
|
| |
This unfortunately broke i386 support since it introduced references to
byte-sized registers that don't exist on that architecture.
Reverts binary submodule
This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
- `primops.txt.pp` gets two new sections for two new primitive types
for signed and unsigned 8-bit integers (`Int8#` and `Word8`
respectively) along with basic arithmetic and comparison
operations. `PrimRep`/`RuntimeRep` get two new constructors for
them. All of the primops translate into the existing `MachOP`s.
- For `CmmCall`s the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then
truncate them back their original width.
- x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate with new tests
Reviewers: hvr, goldfire, bgamari, simonmar
Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Optimisation: we don't have to test the parity flag if we
know the test has already excluded the unordered case: eg >
and >= test for a zero carry flag, which can only occur for
ordered operands.
By reversing comparisons we can avoid testing the parity
for < and <= as well. This works since:
* If any of the arguments is an NaN CF gets set. Resulting in a false result.
* Since this allows us to rule out NaN we can exchange the arguments and invert the
direction of the arrows.
Test Plan: ci/nofib
Reviewers: carter, bgamari, alpmestan
Reviewed By: alpmestan
Subscribers: alpmestan, simonpj, jmct, rwbarton, thomie
GHC Trac Issues: #15196
Differential Revision: https://phabricator.haskell.org/D4990
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This diff is a part of the bigger project which goal is to improve
common profiling tools support (perf) for GHC binaries.
A similar job was already done and reverted in the past:
* https://phabricator.haskell.org/rGHCb1f453e16f0ce11a2ab18cc4c350bdcbd36299a6
* https://phabricator.haskell.org/rGHCf1f3c4f50650110ad0f700d6566a44c515b0548f
Reasoning:
`Perf` and similar tools build in memory symbol table from the .symtab
section of the ELF file to display human-readable function names instead
of the addresses in the output. `Perf` uses only two types of symbols:
`@function` and `@notype` but GHC is not capable to produce any
`@function` symbols so the `perf` output is pretty useless (All the
haskell symbols that you can see in `perf` now are `@notype` internal
symbols extracted by mistake/hack).
The changes:
* mark code related symbols as @function
* small hack to mark InfoTable symbols as code if TABLES_NEXT_TO_CODE is true
Limitations:
* The perf symbolization support is not complete after this patch but
I'm working on the second patch.
* Constructor symbols are not supported. To fix that we can issue extra
local symbols which mark code sections as code and will be only used
for debug.
Test Plan:
tests
any additional ideas?
Perf output on stock ghc 8.4.1:
```
9.78% FibbSlow FibbSlow [.] ckY_info
9.59% FibbSlow FibbSlow [.] cjqd_info
7.17% FibbSlow FibbSlow [.] c3sg_info
6.62% FibbSlow FibbSlow [.] c1X_info
5.32% FibbSlow FibbSlow [.] cjsX_info
4.18% FibbSlow FibbSlow [.] s3rN_info
3.82% FibbSlow FibbSlow [.] c2m_info
3.68% FibbSlow FibbSlow [.] cjlJ_info
3.26% FibbSlow FibbSlow [.] c3sb_info
3.19% FibbSlow FibbSlow [.] cjPQ_info
3.05% FibbSlow FibbSlow [.] cjQd_info
2.97% FibbSlow FibbSlow [.] cjAB_info
2.78% FibbSlow FibbSlow [.] cjzP_info
2.40% FibbSlow FibbSlow [.] cjOS_info
2.38% FibbSlow FibbSlow [.] s3rK_info
2.27% FibbSlow FibbSlow [.] cjq0_info
2.18% FibbSlow FibbSlow [.] cKQ_info
2.13% FibbSlow FibbSlow [.] cjSl_info
1.99% FibbSlow FibbSlow [.] s3rL_info
1.98% FibbSlow FibbSlow [.] c2cC_info
1.80% FibbSlow FibbSlow [.] s3rO_info
1.37% FibbSlow FibbSlow [.] c2f2_info
...
```
Perf output on patched ghc:
```
7.97% FibbSlow FibbSlow [.] c3rM_info
6.75% FibbSlow FibbSlow [.] 0x000000000032cfa8
6.63% FibbSlow FibbSlow [.] cifA_info
4.98% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info
4.55% FibbSlow FibbSlow [.] chXn_info
4.52% FibbSlow FibbSlow [.] c3rH_info
4.45% FibbSlow FibbSlow [.] chZB_info
4.04% FibbSlow FibbSlow [.] Main_fibbzuslow_info
4.03% FibbSlow FibbSlow [.] stg_ap_0_fast
3.76% FibbSlow FibbSlow [.] chXA_info
3.67% FibbSlow FibbSlow [.] cifu_info
3.25% FibbSlow FibbSlow [.] ci4r_info
2.64% FibbSlow FibbSlow [.] s3rf_info
2.42% FibbSlow FibbSlow [.] s3rg_info
2.39% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqInteger_info
2.25% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_minusInteger_info
2.17% FibbSlow FibbSlow [.] ghczmprim_GHCziClasses_zeze_info
2.09% FibbSlow FibbSlow [.] cicc_info
2.03% FibbSlow FibbSlow [.] 0x0000000000331e15
2.02% FibbSlow FibbSlow [.] s3ri_info
1.91% FibbSlow FibbSlow [.] 0x0000000000331bb8
1.89% FibbSlow FibbSlow [.] ci4N_info
...
```
Reviewers: simonmar, niteria, bgamari, goldfire
Reviewed By: simonmar, bgamari
Subscribers: lelf, rwbarton, thomie, carter
GHC Trac Issues: #15501
Differential Revision: https://phabricator.haskell.org/D4713
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: hvr, bgamari, simonmar, jrtc27
Reviewed By: bgamari
Subscribers: alpmestan, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D5034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds foldl' to GhcPrelude and changes must occurences
of foldl to foldl'. This leads to better performance especially
for quick builds where GHC does not perform strictness analysis.
It does change strictness behaviour when we use foldl' to turn
a argument list into function applications. But this is only a
drawback if code looks ONLY at the last argument but not at the first.
And as the benchmarks show leads to fewer allocations in practice
at O2.
Compiler performance for Nofib:
O2 Allocations:
-1 s.d. ----- -0.0%
+1 s.d. ----- -0.0%
Average ----- -0.0%
O2 Compile Time:
-1 s.d. ----- -2.8%
+1 s.d. ----- +1.3%
Average ----- -0.8%
O0 Allocations:
-1 s.d. ----- -0.2%
+1 s.d. ----- -0.1%
Average ----- -0.2%
Test Plan: ci
Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal
Reviewed By: bgamari, monoidal
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4929
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This contains two commits:
----
Make GHC's code-base compatible w/ `MonadFail`
There were a couple of use-sites which implicitly used pattern-matches
in `do`-notation even though the underlying `Monad` didn't explicitly
support `fail`
This refactoring turns those use-sites into explicit case
discrimations and adds an `MonadFail` instance for `UniqSM`
(`UniqSM` was the worst offender so this has been postponed for a
follow-up refactoring)
---
Turn on MonadFail desugaring by default
This finally implements the phase scheduled for GHC 8.6 according to
https://prime.haskell.org/wiki/Libraries/Proposals/MonadFail#Transitionalstrategy
This also preserves some tests that assumed MonadFail desugaring to be
active; all ghc boot libs were already made compatible with this
`MonadFail` long ago, so no changes were needed there.
Test Plan: Locally performed ./validate --fast
Reviewers: bgamari, simonmar, jrtc27, RyanGlScott
Reviewed By: bgamari
Subscribers: bgamari, RyanGlScott, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D5028
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On Windows one is not allowed to drop the stack by more than a page size.
The reason for this is that the OS only allocates enough stack till what
the TEB specifies. After that a guard page is placed and the rest of the
virtual address space is unmapped.
The intention is that doing stack allocations will cause you to hit the
guard which will then map the next page in and move the guard. This is
done to prevent what in the Linux world is known as stack clash
vulnerabilities https://access.redhat.com/security/cve/cve-2017-1000364.
There are modules in GHC for which the liveliness analysis thinks the
reserved 8KB of spill slots isn't enough. One being DynFlags and the
other being Cabal.
Though I think the Cabal one is likely a bug:
```
4d6544: 81 ec 00 46 00 00 sub $0x4600,%esp
4d654a: 8d 85 94 fe ff ff lea -0x16c(%ebp),%eax
4d6550: 3b 83 1c 03 00 00 cmp 0x31c(%ebx),%eax
4d6556: 0f 82 de 8d 02 00 jb 4ff33a <_cLpg_info+0x7a>
4d655c: c7 45 fc 14 3d 50 00 movl $0x503d14,-0x4(%ebp)
4d6563: 8b 75 0c mov 0xc(%ebp),%esi
4d6566: 83 c5 fc add $0xfffffffc,%ebp
4d6569: 66 f7 c6 03 00 test $0x3,%si
4d656e: 0f 85 a6 d7 02 00 jne 503d1a <_cLpb_info+0x6>
4d6574: 81 c4 00 46 00 00 add $0x4600,%esp
```
It allocates nearly 18KB of spill slots for a simple 4 line function
and doesn't even use it. Note that this doesn't happen on x64 or
when making a validate build. Only when making a build without a
validate and build.mk.
This and the allocation in DynFlags means the stack allocation will jump
over the guard page into unmapped memory areas and GHC or an end program
segfaults.
The pagesize on x86 Windows is 4KB which means we hit it very easily for
these two modules, which explains the total DOA of GHC 32bit for the past
3 releases and the "random" segfaults on Windows.
```
0:000> bp 00503d29
0:000> gn
Breakpoint 0 hit
WARNING: Stack overflow detected. The unwound frames are extracted from outside
normal stack bounds.
eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
eip=00503d29 esp=013e96fc ebp=03cf8f70 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
setup+0x103d29:
00503d29 89442440 mov dword ptr [esp+40h],eax ss:002b:013e973c=????????
WARNING: Stack overflow detected. The unwound frames are extracted from outside
normal stack bounds.
WARNING: Stack overflow detected. The unwound frames are extracted from outside
normal stack bounds.
0:000> !teb
TEB at 00384000
ExceptionList: 013effcc
StackBase: 013f0000
StackLimit: 013eb000
```
This doesn't fix the liveliness analysis but does fix the allocations, by
emitting a function call to `__chkstk_ms` when doing allocations of larger
than a page, this will make sure the stack is probed every page so the kernel
maps in the next page.
`__chkstk_ms` is provided by `libGCC`, which is under the
`GNU runtime exclusion license`, so it's safe to link against it, even for
proprietary code. (Technically we already do since we link compiled C code in.)
For allocations smaller than a page we drop the stack and probe the new address.
This avoids the function call and still makes sure we hit the guard if needed.
PS: In case anyone is Wondering why we didn't notice this before, it's because we
only test x86_64 and on Windows 10. On x86_64 the page size is 8KB and also the
kernel is a bit more lenient on Windows 10 in that it seems to catch the segfault
and resize the stack if it was unmapped:
```
0:000> t
eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40
eip=00503d2d esp=013e96fc ebp=03cf8f70 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
setup+0x103d2d:
00503d2d 8b461b mov eax,dword ptr [esi+1Bh] ds:002b:03b6b9e4=03cac431
0:000> !teb
TEB at 00384000
ExceptionList: 013effcc
StackBase: 013f0000
StackLimit: 013e9000
```
Likely Windows 10 has a guard page larger than previous versions.
This fixes the stack allocations, and as soon as I get the time I will look at
the liveliness analysis. I find it highly unlikely that simple Cabal function
requires ~2200 spill slots.
Test Plan: ./validate
Reviewers: simonmar, bgamari
Reviewed By: bgamari
Subscribers: AndreasK, rwbarton, thomie, carter
GHC Trac Issues: #15154
Differential Revision: https://phabricator.haskell.org/D4917
|