| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes all CafInfo predictions and various hacks to preserve
predicted CafInfos from the compiler and assigns final CafInfos to
interface Ids after code generation. SRT analysis is extended to support
static data, and Cmm generator is modified to allow generating
static_link fields after SRT analysis.
This also fixes `-fcatch-bottoms`, which introduces error calls in case
expressions in CorePrep, which runs *after* CoreTidy (which is where we
decide on CafInfos) and turns previously non-CAFFY things into CAFFY.
Fixes #17648
Fixes #9718
Evaluation
==========
NoFib
-----
Boot with: `make boot mode=fast`
Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1`
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS -0.0% 0.0% -0.0% -0.0% -0.0%
CSD -0.0% 0.0% -0.0% -0.0% -0.0%
FS -0.0% 0.0% -0.0% -0.0% -0.0%
S -0.0% 0.0% -0.0% -0.0% -0.0%
VS -0.0% 0.0% -0.0% -0.0% -0.0%
VSD -0.0% 0.0% -0.0% -0.0% -0.5%
VSM -0.0% 0.0% -0.0% -0.0% -0.0%
anna -0.1% 0.0% -0.0% -0.0% -0.0%
ansi -0.0% 0.0% -0.0% -0.0% -0.0%
atom -0.0% 0.0% -0.0% -0.0% -0.0%
awards -0.0% 0.0% -0.0% -0.0% -0.0%
banner -0.0% 0.0% -0.0% -0.0% -0.0%
bernouilli -0.0% 0.0% -0.0% -0.0% -0.0%
binary-trees -0.0% 0.0% -0.0% -0.0% -0.0%
boyer -0.0% 0.0% -0.0% -0.0% -0.0%
boyer2 -0.0% 0.0% -0.0% -0.0% -0.0%
bspt -0.0% 0.0% -0.0% -0.0% -0.0%
cacheprof -0.0% 0.0% -0.0% -0.0% -0.0%
calendar -0.0% 0.0% -0.0% -0.0% -0.0%
cichelli -0.0% 0.0% -0.0% -0.0% -0.0%
circsim -0.0% 0.0% -0.0% -0.0% -0.0%
clausify -0.0% 0.0% -0.0% -0.0% -0.0%
comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0%
compress -0.0% 0.0% -0.0% -0.0% -0.0%
compress2 -0.0% 0.0% -0.0% -0.0% -0.0%
constraints -0.0% 0.0% -0.0% -0.0% -0.0%
cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0%
cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0%
cse -0.0% 0.0% -0.0% -0.0% -0.0%
digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0%
digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0%
dom-lt -0.0% 0.0% -0.0% -0.0% -0.0%
eliza -0.0% 0.0% -0.0% -0.0% -0.0%
event -0.0% 0.0% -0.0% -0.0% -0.0%
exact-reals -0.0% 0.0% -0.0% -0.0% -0.0%
exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0%
expert -0.0% 0.0% -0.0% -0.0% -0.0%
fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0%
fasta -0.0% 0.0% -0.0% -0.0% -0.0%
fem -0.0% 0.0% -0.0% -0.0% -0.0%
fft -0.0% 0.0% -0.0% -0.0% -0.0%
fft2 -0.0% 0.0% -0.0% -0.0% -0.0%
fibheaps -0.0% 0.0% -0.0% -0.0% -0.0%
fish -0.0% 0.0% -0.0% -0.0% -0.0%
fluid -0.1% 0.0% -0.0% -0.0% -0.0%
fulsom -0.0% 0.0% -0.0% -0.0% -0.0%
gamteb -0.0% 0.0% -0.0% -0.0% -0.0%
gcd -0.0% 0.0% -0.0% -0.0% -0.0%
gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0%
genfft -0.0% 0.0% -0.0% -0.0% -0.0%
gg -0.0% 0.0% -0.0% -0.0% -0.0%
grep -0.0% 0.0% -0.0% -0.0% -0.0%
hidden -0.0% 0.0% -0.0% -0.0% -0.0%
hpg -0.1% 0.0% -0.0% -0.0% -0.0%
ida -0.0% 0.0% -0.0% -0.0% -0.0%
infer -0.0% 0.0% -0.0% -0.0% -0.0%
integer -0.0% 0.0% -0.0% -0.0% -0.0%
integrate -0.0% 0.0% -0.0% -0.0% -0.0%
k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0%
kahan -0.0% 0.0% -0.0% -0.0% -0.0%
knights -0.0% 0.0% -0.0% -0.0% -0.0%
lambda -0.0% 0.0% -0.0% -0.0% -0.0%
last-piece -0.0% 0.0% -0.0% -0.0% -0.0%
lcss -0.0% 0.0% -0.0% -0.0% -0.0%
life -0.0% 0.0% -0.0% -0.0% -0.0%
lift -0.0% 0.0% -0.0% -0.0% -0.0%
linear -0.1% 0.0% -0.0% -0.0% -0.0%
listcompr -0.0% 0.0% -0.0% -0.0% -0.0%
listcopy -0.0% 0.0% -0.0% -0.0% -0.0%
maillist -0.0% 0.0% -0.0% -0.0% -0.0%
mandel -0.0% 0.0% -0.0% -0.0% -0.0%
mandel2 -0.0% 0.0% -0.0% -0.0% -0.0%
mate -0.0% 0.0% -0.0% -0.0% -0.0%
minimax -0.0% 0.0% -0.0% -0.0% -0.0%
mkhprog -0.0% 0.0% -0.0% -0.0% -0.0%
multiplier -0.0% 0.0% -0.0% -0.0% -0.0%
n-body -0.0% 0.0% -0.0% -0.0% -0.0%
nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0%
para -0.0% 0.0% -0.0% -0.0% -0.0%
paraffins -0.0% 0.0% -0.0% -0.0% -0.0%
parser -0.1% 0.0% -0.0% -0.0% -0.0%
parstof -0.1% 0.0% -0.0% -0.0% -0.0%
pic -0.0% 0.0% -0.0% -0.0% -0.0%
pidigits -0.0% 0.0% -0.0% -0.0% -0.0%
power -0.0% 0.0% -0.0% -0.0% -0.0%
pretty -0.0% 0.0% -0.3% -0.4% -0.4%
primes -0.0% 0.0% -0.0% -0.0% -0.0%
primetest -0.0% 0.0% -0.0% -0.0% -0.0%
prolog -0.0% 0.0% -0.0% -0.0% -0.0%
puzzle -0.0% 0.0% -0.0% -0.0% -0.0%
queens -0.0% 0.0% -0.0% -0.0% -0.0%
reptile -0.0% 0.0% -0.0% -0.0% -0.0%
reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0%
rewrite -0.0% 0.0% -0.0% -0.0% -0.0%
rfib -0.0% 0.0% -0.0% -0.0% -0.0%
rsa -0.0% 0.0% -0.0% -0.0% -0.0%
scc -0.0% 0.0% -0.3% -0.5% -0.4%
sched -0.0% 0.0% -0.0% -0.0% -0.0%
scs -0.0% 0.0% -0.0% -0.0% -0.0%
simple -0.1% 0.0% -0.0% -0.0% -0.0%
solid -0.0% 0.0% -0.0% -0.0% -0.0%
sorting -0.0% 0.0% -0.0% -0.0% -0.0%
spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0%
sphere -0.0% 0.0% -0.0% -0.0% -0.0%
symalg -0.0% 0.0% -0.0% -0.0% -0.0%
tak -0.0% 0.0% -0.0% -0.0% -0.0%
transform -0.0% 0.0% -0.0% -0.0% -0.0%
treejoin -0.0% 0.0% -0.0% -0.0% -0.0%
typecheck -0.0% 0.0% -0.0% -0.0% -0.0%
veritas -0.0% 0.0% -0.0% -0.0% -0.0%
wang -0.0% 0.0% -0.0% -0.0% -0.0%
wave4main -0.0% 0.0% -0.0% -0.0% -0.0%
wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0%
wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0%
x2n1 -0.0% 0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Min -0.1% 0.0% -0.3% -0.5% -0.5%
Max -0.0% 0.0% -0.0% -0.0% -0.0%
Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
circsim -0.1% 0.0% -0.0% -0.0% -0.0%
constraints -0.0% 0.0% -0.0% -0.0% -0.0%
fibheaps -0.0% 0.0% -0.0% -0.0% -0.0%
gc_bench -0.0% 0.0% -0.0% -0.0% -0.0%
hash -0.0% 0.0% -0.0% -0.0% -0.0%
lcss -0.0% 0.0% -0.0% -0.0% -0.0%
power -0.0% 0.0% -0.0% -0.0% -0.0%
spellcheck -0.0% 0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Min -0.1% 0.0% -0.0% -0.0% -0.0%
Max -0.0% 0.0% -0.0% -0.0% -0.0%
Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0%
Manual inspection of programs in testsuite/tests/programs
---------------------------------------------------------
I built these programs with a bunch of dump flags and `-O` and compared
STG, Cmm, and Asm dumps and file sizes.
(Below the numbers in parenthesis show number of modules in the program)
These programs have identical compiler (same .hi and .o sizes, STG, and
Cmm and Asm dumps):
- Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3),
andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2),
jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4),
jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1),
bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1),
strict_anns (1), thurston-module-arith (2), okeefe_neural (1),
joao-circular (6), 10queens (1)
Programs with different compiler outputs:
- jl_defaults (1): For some reason GHC HEAD marks a lot of top-level
`[Int]` closures as CAFFY for no reason. With this patch we no longer
make them CAFFY and generate less SRT entries. For some reason Main.o
is slightly larger with this patch (1.3%) and the executable sizes are
the same. (I'd expect both to be smaller)
- launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked
as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the
executable sizes are the same.
- galois_raytrace (13): Differences are in the Parse module. There are a
lot, but some of the changes are caused by the fact that for some
reason (I think a bug) GHC HEAD marks the dictionary for `Functor
Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the
same.
- north_array: We now generate less SRT entries because some of array
primops used in this program like `NewArrayOp` get eliminated during
Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24%
larger (9224 bytes from 9000 bytes), executable sizes are the same.
- seward-space-leak: Difference in this program is better shown by this
smaller example:
module Lib where
data CDS
= Case [CDS] [(Int, CDS)]
| Call CDS CDS
instance Eq CDS where
Case sels1 rets1 == Case sels2 rets2 =
sels1 == sels2 && rets1 == rets2
Call a1 b1 == Call a2 b2 =
a1 == a2 && b1 == b2
_ == _ =
False
In this program GHC HEAD builds a new SRT for the recursive group of
`(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==`
in its SRT field, and `==` uses the SRT object as its SRT. With this
patch we use the closure for `/=` as the SRT and add `==` there. Then
`/=` gets an empty SRT field and `==` points to `/=` in its SRT
field.
This change looks fine to me.
Main.o gets 0.07% larger, executable sizes are identical.
head.hackage
------------
head.hackage's CI script builds 428 packages from Hackage using this
patch with no failures.
Compiler performance
--------------------
The compiler perf tests report that the compiler allocates slightly more
(worst case observed so far is 4%). However most programs in the test
suite are small, single file programs. To benchmark compiler performance
on something more realistic I build Cabal (the library, 236 modules)
with different optimisation levels. For the "max residency" row I run
GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows
are generated with just `-s`. (This is because `-i0` causes running GC
much more frequently and as a result "bytes copied" gets inflated by
more than 25x in some cases)
* -O0
| | GHC HEAD | This MR | Diff |
| --------------- | -------------- | -------------- | ------ |
| Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% |
| Bytes copied | 4,926,037,184 | 4,990,638,760 | +1.31% |
| Max residency | 421,225,624 | 424,324,264 | +0.73% |
* -O1
| | GHC HEAD | This MR | Diff |
| --------------- | --------------- | --------------- | ------ |
| Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% |
| Bytes copied | 26,943,452,560 | 27,089,972,296 | +0.54% |
| Max residency | 982,643,440 | 991,663,432 | +0.91% |
* -O2
| | GHC HEAD | This MR | Diff |
| --------------- | --------------- | --------------- | ------ |
| Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% |
| Bytes copied | 37,044,237,616 | 36,121,690,472 | -2.49% |
| Max residency | 1,071,600,328 | 1,086,396,256 | +1.38% |
Extra compiler allocations
--------------------------
Runtime allocations of programs are as reported above (NoFib section).
The compiler now allocates more than before. Main source of allocation
in this patch compared to base commit is the new SRT algorithm
(GHC.Cmm.Info.Build). Below is some of the extra work we do with this
patch, numbers generated by profiled stage 2 compiler when building a
pathological case (the test 'ManyConstructors') with '-O2':
- We now sort the final STG for a module, which means traversing the
entire program, generating free variable set for each top-level
binding, doing SCC analysis, and re-ordering the program. In
ManyConstructors this step allocates 97,889,952 bytes.
- We now do SRT analysis on static data, which in a program like
ManyConstructors causes analysing 10,000 bindings that we would
previously just skip. This step allocates 70,898,352 bytes.
- We now maintain an SRT map for the entire module as we compile Cmm
groups:
data ModuleSRTInfo = ModuleSRTInfo
{ ...
, moduleSRTMap :: SRTMap
}
(SRTMap is just a strict Map from the 'containers' library)
This map gets an entry for most bindings in a module (exceptions are
THUNKs and CAFFY static functions). For ManyConstructors this map
gets 50015 entries.
- Once we're done with code generation we generate a NameSet from SRTMap
for the non-CAFFY names in the current module. This set gets the same
number of entries as the SRTMap.
- Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using
the NameSet generated in the previous step. This usually does the
least amount of allocation among the work listed here.
Only place with this patch where we do less work in the CAF analysis in
the tidying pass (CoreTidy). However that doesn't save us much, as the
pass still needs to traverse the whole program and update IdInfos for
other reasons. Only thing we don't here do is the `hasCafRefs` pass over
the RHS of bindings, which is a stateless pass that returns a boolean
value, so it doesn't allocate much.
(Metric changes blow are all increased allocations)
Metric changes
--------------
Metric Increase:
ManyAlternatives
ManyConstructors
T13035
T14683
T1969
T9961
|
|
|
|
|
|
| |
incomplete-uni-patterns and incomplete-record-updates will be in -Wall at a
future date, so prepare for that by disabling those warnings on files that
trigger them.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the calling convention for functions passing raw SSE-register
values by adding padding as needed to get the values in the right
registers. This problem cropped up when some args were unused an dropped
from the live list.
This folds together 2e23e1c7de01c92b038e55ce53d11bf9db993dd4 and
73273be476a8cc6c13368660b042b3b0614fd928 previously from @kavon.
Metric Increase:
T12707
ManyConstructors
|
| |
|
|
|
|
| |
Fixes #13904.
|
| |
|
| |
|
|
|
|
|
| |
This was a hack which is no longer necessary now since we introduce a
dedicated entry block for each procedure.
|
|
|
|
|
|
|
|
|
|
| |
The LLVM IR forbids the entry label of a procedure from having any
predecessors. In the case of a simple looping function the LLVM code
generator broke this invariant, as noted in #17589. Fix this by
moving the function prologue to its own basic block, as suggested by
@kavon in #11649.
Fixes #11649 and #17589.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add 'dumpAction' hook to DynFlags.
It allows GHC API users to catch dumped intermediate codes and
information. The format of the dump (Core, Stg, raw text, etc.) is now
reported allowing easier automatic handling.
* Add 'traceAction' hook to DynFlags.
Some dumps go through the trace mechanism (for instance unfoldings that
have been considered for inlining). This is problematic because:
1) dumps aren't written into files even with -ddump-to-file on
2) dumps are written on stdout even with GHC API
3) in this specific case, dumping depends on unsafe globally stored
DynFlags which is bad for GHC API users
We introduce 'traceAction' hook which allows GHC API to catch those
traces and to avoid using globally stored DynFlags.
* Avoid dumping empty logs via dumpAction/traceAction (but still write
empty files to keep the existing behavior)
|
| |
|
|
|
|
|
| |
Get rid of CPP macro WORDS_BIGENDIAN which is not defined anymore, and
replace it by DynFlag. This fixes partially #17337.
|
|
|
|
|
| |
Metric Decrease:
T14683
|
|
|
|
|
| |
For s390x the GHC calling convention is only supported since LLVM
version 10. Issue a warning in case an older version of LLVM is used.
|
|
|
|
|
|
|
|
|
|
|
|
| |
19 times out of 20 we already have dynflags in scope.
We could just always use `return dflags`. But this is in fact not free.
When looking at some STG code I noticed that we always allocate a
closure for this expression in the heap. Clearly a waste in these cases.
For the other cases we can either just modify the callsite to
get dynflags or use the _D variants of withTiming I added which
will use getDynFlags under the hood.
|
|
|
|
|
|
| |
This patch adds support for the s390x architecture for the LLVM code
generator. The patch includes a register mapping of STG registers onto
s390x machine registers which enables a registerised build.
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in the new Note [LLVM Configuration] in SysTools, we now
load llvm-targets and llvm-passes lazily to avoid the overhead of doing
so when -fllvm isn't used (also known as "the common case").
Noticed in #17003.
Metric Decrease:
T12234
T12150
|
|
|
|
|
|
| |
Add StgToCmm module hierarchy. Platform modules that are used in several
other places (NCG, LLVM codegen, Cmm transformations) are put into
GHC.Platform.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were two problems with LLVM version checking:
- The parser would only parse x and x.y formatted versions. E.g. 1.2.3
would be rejected.
- The version check was too strict and would reject x.y formatted
versions. E.g. when we support version 7 it'd reject 7.0 ("LLVM
version 7.0") and only accept 7 ("LLVM version 7").
We now parse versions with arbitrarily deep minor numbering (x.y.z.t...)
and accept versions as long as the major version matches the supported
version (e.g. 7.1, 7.1.2, 7.1.2.3 ...).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This generalizes code generators (outputAsm, outputLlvm, outputC, and
the call site codeOutput) so that they'll return the return values of
the passed Cmm streams.
This allows accumulating data during Cmm generation and returning it to
the call site in HscMain.
Previously the Cmm streams were assumed to return (), so the code
generators returned () as well.
This change is required by !1304 and !1530.
Skipping CI as this was tested before and I only updated the commit
message.
[skip ci]
|
|
|
|
|
|
|
|
|
| |
This adds a Stream.consume function, uses it in LLVM and C code
generators, and removes the use of Stream.collect function which was
used to collect streaming Cmm generation results into a list.
LLVM and C backends now properly use streamed Cmm generation, instead of
collecting Cmm groups into a list before generating LLVM/C code.
|
|
|
|
|
|
|
| |
Unfortunately this will require more work; register allocation is
quite broken.
This reverts commit acd795583625401c5554f8e04ec7efca18814011.
|
|
|
|
|
| |
Instead following @angerman's suggestion put them in the config file.
Maybe we could re-key llvm-targets someday, but this is good for now.
|
|
|
|
|
|
|
| |
This adds support for constructing vector types from Float#, Double# etc
and performing arithmetic operations on them
Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here the following changes are introduced:
- A read barrier machine op is added to Cmm.
- The order in which a closure's fields are read and written is changed.
- Memory barriers are added to RTS code to ensure correctness on
out-or-order machines with weak memory ordering.
Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
is lowered to an instruction that ensures memory reads that occur after said
instruction in program order are not performed before reads coming before said
instruction in program order. On machines with strong memory ordering properties
(e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
MO_ReadBarrier is simply erased. However, such an instruction is necessary on
weakly ordered machines, e.g. ARM and PowerPC.
Weam memory ordering has consequences for how closures are observed and mutated.
For example, consider a closure that needs to be updated to an indirection. In
order for the indirection to be safe for concurrent observers to enter, said
observers must read the indirection's info table before they read the
indirectee. Furthermore, the entering observer makes assumptions about the
closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
closure has an indirectee pointer that is safe to follow.
When a closure is updated with an indirection, both its info table and its
indirectee must be written. With weak memory ordering, these two writes can be
arbitrarily reordered, and perhaps even interleaved with other threads' reads
and writes (in the absence of memory barrier instructions). Consider this
example of a bad reordering:
- An updater writes to a closure's info table (INFO_TYPE is now IND).
- A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
- A concurrent observer reads the closure's indirectee and enters it. (!!!)
- An updater writes the closure's indirectee.
Here the update to the indirectee comes too late and the concurrent observer has
jumped off into the abyss. Speculative execution can also cause us issues,
consider:
- An observer is about to case on a value in closure's info table.
- The observer speculatively reads one or more of closure's fields.
- An updater writes to closure's info table.
- The observer takes a branch based on the new info table value, but with the
old closure fields!
- The updater writes to the closure's other fields, but its too late.
Because of these effects, reads and writes to a closure's info table must be
ordered carefully with respect to reads and writes to the closure's other
fields, and memory barriers must be placed to ensure that reads and writes occur
in program order. Specifically, updates to a closure must follow the following
pattern:
- Update the closure's (non-info table) fields.
- Write barrier.
- Update the closure's info table.
Observing a closure's fields must follow the following pattern:
- Read the closure's info pointer.
- Read barrier.
- Read the closure's (non-info table) fields.
This patch updates RTS code to obey this pattern. This should fix long-standing
SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
out-of-order execution) and PowerPC. This fixes issue #15449.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
| |
LLVM version numberinf changed recently. Previously, releases were numbered
4.0, 5.0 and 6.0 but with version 7, they dropped the redundant ".0".
Fix requires for Llvm detection and some code.
|
|
|
|
|
|
|
| |
ghc-pkg needs to be aware of platforms so it can figure out which
subdire within the user package db to use. This is admittedly
roundabout, but maybe Cabal could use the same notion of a platform as
GHC to good affect too.
|
| |
|
|
|
|
|
| |
Previously log and exp were primitives yet log1p and expm1 were FFI
calls. Fix this non-uniformity.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. If GHC is to be multi-target, these cannot be baked in at compile
time.
2. Compile-time flags have a higher maintenance than run-time flags.
3. The old way makes build system implementation (various bootstrapping
details) with the thing being built. E.g. GHC doesn't need to care
about which integer library *will* be used---this is purely a crutch
so the build system doesn't need to pass flags later when using that
library.
4. Experience with cross compilation in Nixpkgs has shown things work
nicer when compiler's can *optionally* delegate the bootstrapping the
package manager. The package manager knows the entire end-goal build
plan, and thus can make top-down decisions on bootstrapping. GHC can
just worry about GHC, not even core library like base and ghc-prim!
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a new closure identifier is being established to a
local or exported closure already emitted into the same
module, refrain from adding an IND_STATIC closure, and
instead emit an assembly-language alias.
Inter-module IND_STATIC objects still remain, and need to be
addressed by other measures.
Binary-size savings on nofib are around 0.1%.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
* makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
behavior in 32bit haskell code
* removes the 80bit floating point representation from the supported float sizes
* theres still 1 tiny bit of x87 support needed,
for handling float and double return values in FFI calls wrt the C ABI on x86_32,
but this one piece does not leak into the rest of NCG.
* Lots of code thats not been touched in a long time got deleted as a
consequence of all of this
all in all, this change paves the way towards a lot of future further
improvements in how GHC handles floating point computations, along with
making the native code gen more accessible to a larger pool of contributors.
|
|
|
|
|
|
| |
This commit includes the necessary changes in code and
documentation to support a primop that reverses a word's
bits. It also includes a test.
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
| |
Also used ByteString in some other relevant places
|
| |
|
|
|
|
|
|
| |
The alias is of type i8, so its global variable name
should have type i8*. Anyway we should never deal
with pointers to (i8*)!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a fairly long-standing bug (dating back to 2015) in
RdrName.bestImport, namely
commit 9376249b6b78610db055a10d05f6592d6bbbea2f
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date: Wed Oct 28 17:16:55 2015 +0000
Fix unused-import stuff in a better way
In that patch got the sense of the comparison back to front, and
thereby failed to implement the unused-import rules described in
Note [Choosing the best import declaration] in RdrName
This led to Trac #13064 and #15393
Fixing this bug revealed a bunch of unused imports in libraries;
the ones in the GHC repo are part of this commit.
The two important changes are
* Fix the bug in bestImport
* Modified the rules by adding (a) in
Note [Choosing the best import declaration] in RdrName
Reason: the previosu rules made Trac #5211 go bad again. And
the new rule (a) makes sense to me.
In unravalling this I also ended up doing a few other things
* Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the
things that are used, rather than [AvailInfo]. This is simpler
and more direct.
* Rename greParentName to greParent_maybe, to follow GHC
naming conventions
* Delete dead code RdrName.greUsedRdrName
Bumps a few submodules.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5312
|
|
|
|
| |
remove local
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a previous patch we replaced some built-in literal constructors
(MachInt, MachWord, etc.) with a single LitNumber constructor.
In this patch we replace the `Mach` prefix of the remaining constructors
with `Lit` for consistency (e.g., LitChar, LitLabel, etc.).
Sadly the name `LitString` was already taken for a kind of FastString
and it would become misleading to have both `LitStr` (literal
constructor renamed after `MachStr`) and `LitString` (FastString
variant). Hence this patch renames the FastString variant `PtrString`
(which is more accurate) and the literal string constructor now uses the
least surprising `LitString` name.
Both `Literal` and `LitString/PtrString` have recently seen breaking
changes so doing this kind of renaming now shouldn't harm much.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4881
|
|
|
|
| |
only concat once
|
| |
|
|
|
|
| |
This reverts commit adcb5fb47c0942671d409b940d8884daa9359ca4.
|
|
|
|
| |
This reverts commit d8495549ba9d194815c2d0eaee6797fc7c00756a.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes two isssues:
- Using bitcast for MO_XX_Conv
Arguments to a bitcast must be of the same size. We should be using
`trunc` and `zext` instead.
- Using unsupported MO_*_QuotRem for LLVM
The two primops `MO_*_QuotRem` are not supported by the LLVM backend,
so
we shouldn't use them for `Int8#`/`Word8#` (just as we do not use
them for
`Int#`/`Word#`).
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: manually run tests with WAY=llvm
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, carter
GHC Trac Issues: #15864
Differential Revision: https://phabricator.haskell.org/D5304
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
primops.txt.pp gets two new sections for two new primitive types for
signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
two new constructors for them. All of the primops translate into the
existing MachOPs.
For CmmCalls the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then truncate
them back their original width.
x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
This is the second attempt at merging this, after the first attempt in
D4475 had to be backed out due to regressions on i386.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate (on both x86-{32,64})
Reviewers: bgamari, hvr, goldfire, simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5258
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now calculate the SSE register padding needed to fix the calling
convention in LLVM in a robust way: grouping them by whether
registers in that class overlap (with the same class overlapping
itself).
My prior patch assumed that no matter the platform, physical
register Fx aliases with Dx, etc, for our calling convention.
This is unfortunately not the case for any platform except x86-64.
Test Plan:
Only know how to test on x86-64, but it should be tested on ARM with:
`make test WAYS=llvm && make test WAYS=optllvm`
Reviewers: bgamari, angerman
Reviewed By: bgamari
Subscribers: rwbarton, carter
GHC Trac Issues: #15780, #14251, #15747
Differential Revision: https://phabricator.haskell.org/D5254
|
|
|
|
|
|
|
|
|
| |
This unfortunately broke i386 support since it introduced references to
byte-sized registers that don't exist on that architecture.
Reverts binary submodule
This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
|