summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Counters for characterising mkTyConApp branchwip/mkTyConApp-countersBen Gamari2020-03-242-4/+69
|
* Enable -fexpose-all-symbols when debug level is setBen Gamari2020-03-182-3/+9
|
* codeGen: Produce local symbols for module-internal functionsBen Gamari2020-03-189-17/+83
| | | | | | | | | | | | | | | | | | | | It turns out that some important native debugging/profiling tools (e.g. perf) rely only on symbol tables for function name resolution (as opposed to using DWARF DIEs). However, previously GHC would emit temporary symbols (e.g. `.La42b`) to identify module-internal entities. Such symbols are dropped during linking and therefore not visible to runtime tools (in addition to having rather un-helpful unique names). For instance, `perf report` would often end up attributing all cost to the libc `frame_dummy` symbol since Haskell code was no covered by any proper symbol (see #17605). We now rather follow the model of C compilers and emit descriptively-named local symbols for module internal things. Since this will increase object file size this behavior can be disabled with the `-fno-expose-all-symbols` flag. With this `perf record` can finally be used against Haskell executables. Even more, with `-g3` `perf annotate` provides inline source code.
* Pass specialised utf8DecodeChar# to utf8DecodeLazy# for performanceDaniel Gröber2020-03-182-14/+12
| | | | | | | Currently we're passing a indexWord8OffAddr# type function to utf8DecodeLazy# which then passes it on to utf8DecodeChar#. By passing one of utf8DecodeCharAddr# or utf8DecodeCharByteArray# instead we benefit from the inlining and specialization already done for those.
* Use ShortByteString for FastStringDaniel Gröber2020-03-185-188/+192
| | | | | | | | | | | | | | | There are multiple reasons we want this: - Fewer allocations: ByteString has 3 fields, ShortByteString just has one. - ByteString memory is pinned: - This can cause fragmentation issues (see for example #13110) but also - makes using FastStrings in compact regions impossible. Metric Decrease: T5837 T12150 T12234 T12425
* Remove length field from FastStringDaniel Gröber2020-03-181-24/+24
|
* Make example collapsiblePaavo2020-03-171-0/+1
|
* Clean upPaavo2020-03-171-3/+2
|
* Add example for Data.Semigroup.diffPaavo2020-03-171-0/+15
|
* Don't update ModDetails with CafInfos when opts are disabledÖmer Sinan Ağacan2020-03-172-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is consistent with the interface file behavior where we omit HsNoCafRefs annotations with -fomit-interface-pragmas (implied by -O0). ModDetails and ModIface are just different representations of the same thing, so they really need to be in sync. This patch does the right thing and does not need too much explanation, but here's an example of a problem not doing this causes in !2842: -- MyInteger.hs module MyInteger ( MyInteger (MyInteger) , ToMyInteger (toMyInteger) ) where newtype MyInteger = MyInteger Integer class ToMyInteger a where toMyInteger :: a -> MyInteger instance ToMyInteger Integer where toMyInteger = MyInteger {- . succ -} -- Main.hs module Main ( main ) where import MyInteger (MyInteger (MyInteger), toMyInteger) main :: IO () main = do let (MyInteger i) = (id . toMyInteger) (41 :: Integer) print i If I build this with -O0, without this fix, we generate a ModDetails with accurate LFInfo for toMyInteger (MyInteger.$fToMyIntegerInteger) which says that it's a LFReEntrant with arity 1. This means in the use site (Main) we tag the value: R3 = MyInteger.$fToMyIntegerInteger_closure + 1; R2 = GHC.Base.id_closure; R1 = GHC.Base.._closure; Sp = Sp - 16; call stg_ap_ppp_fast(R4, R3, R2, R1) args: 24, res: 0, upd: 24; Now we change the definition by uncommenting the `succ` part and it becomes a thunk: MyInteger.$fToMyIntegerInteger [InlPrag=INLINE (sat-args=0)] :: MyInteger.ToMyInteger GHC.Integer.Type.Integer [GblId[DFunId(nt)]] = {} \u [] $ctoMyInteger_rEA; and its LFInfo is now LFThunk. This change in LFInfo makes a difference in the use site: we can no longer tag it. But becuase the interface fingerprint does not change (because ModIface does not change) we don't rebuild Main and tag the thunk. (1.2% increase in allocations when building T12545 on armv7 because we generate more code without CafInfos) Metric Increase: T12545
* Implement mapTyCo like foldTyCoSimon Peyton Jones2020-03-174-99/+113
| | | | | | | | | | | | | | | | | | | | | | | | This patch makes mapType use the successful idiom described in TyCoRep Note [Specialising foldType] I have not yet changed any functions to use mapType, though there may be some suitable candidates. This patch should be a no-op in terms of functionality but, because it inlines the mapper itself, I'm hoping that there may be some modest perf improvements. Metric Decrease: T5631 T5642 T3064 T9020 T14683 hie002 haddock.Cabal haddock.base haddock.compiler
* Document the units of -ddump-timingsMaximilian Tagher2020-03-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Right now, in the output of -ddump-timings to a file, you can't tell what the units are: ``` CodeGen [TemplateTestImports]: alloc=22454880 time=14.597 ``` I believe bytes/milliseconds are the correct units, but confirmation would be appreciated. I'm basing it off of this snippet from `withTiming'`: ``` when (verbosity dflags >= 2 && prtimings == PrintTimings) $ liftIO $ logInfo dflags (defaultUserStyle dflags) (text "!!!" <+> what <> colon <+> text "finished in" <+> doublePrec 2 time <+> text "milliseconds" <> comma <+> text "allocated" <+> doublePrec 3 (realToFrac alloc / 1024 / 1024) <+> text "megabytes") ``` which implies time is in milliseconds, and allocations in bytes (which divided by 1024 would be KB, and again would be MB)
* Don't use non-portable operator "==" in configure.acPHO2020-03-171-1/+1
| | | | | The test operator "==" is a Bash extension and produces a wrong result if /bin/sh is not Bash.
* Update sanity checking for TSOs:Ömer Sinan Ağacan2020-03-161-36/+22
| | | | | | | | | | | | | | | | - Remove an invalid assumption about GC checking what_next field. The GC doesn't care about what_next at all, if a TSO is reachable then all its pointers are followed (other than global_tso, which is only followed by compacting GC). - Remove checkSTACK in checkTSO: TSO stacks will be visited in checkHeapChain, or checkLargeObjects etc. - Add an assertion in checkTSO to check that the global_link field is sane. - Did some refactor to remove forward decls in checkGlobalTSOList and added braces around single-statement if statements.
* Modules: Core (#13009)Sylvain Henry2020-03-16225-1194/+1209
| | | | Update submodule: haddock
* base: add strict IO functions: readFile', getContents', hGetContents'Lysxia2020-03-168-2/+125
|
* Fix global_link of TSOs for threads reachable via dead weaksÖmer Sinan Ağacan2020-03-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #17785 Here's how the problem occurs: - In generation 0 we have a TSO that is finished (i.e. it has no more work to do or it is killed). - The TSO only becomes reachable after collectDeadWeakPtrs(). - After collectDeadWeakPtrs() we switch to WeakDone phase where we don't move TSOs to different lists anymore (like the next gen's thread list or the resurrected_threads list). - So the TSO will never be moved to a generation's thread list, but it will be promoted to generation 1. - Generation 1 collected via mark-compact, and because the TSO is reachable it is marked, and its `global_link` field, which is bogus at this point (because the TSO is not in a list), will be threaded. - Chaos ensues. In other words, when these conditions hold: - A TSO is reachable only after collectDeadWeakPtrs() - It's finished (what_next is ThreadComplete or ThreadKilled) - It's retained by mark-compact collector (moving collector doesn't evacuate the global_list field) We end up doing random mutations on the heap because the TSO's global_list field is not valid, but it still looks like a heap pointer so we thread it during compacting GC. The fix is simple: when we traverse old_threads lists to resurrect unreachable threads the threads that won't be resurrected currently stays on the old_threads lists. Those threads will never be visited again by MarkWeak so we now reset the global_list fields. This way compacting GC does not thread pointers to nowhere. Testing ------- The reproducer in #17785 is quite large and hard to build, because of the dependencies, so I'm not adding a regression test. In my testing the reproducer would take a less than 5 seconds to run, and once in every ~5 runs would fail with a segfault or an assertion error. In other cases it also fails with a test failure. Because the tests never fail with the bug fix, assuming the code is correct, this also means that this bug can sometimes lead to incorrect runtime results. After the fix I was able to run the reproducer repeatedly for about an hour, with no runtime crashes or test failures. To run the reproducer clone the git repo: $ git clone https://github.com/osa1/streamly --branch ghc-segfault Then clone primitive and atomic-primops from their git repos and point to the clones in cabal.project.local. The project should then be buildable using GHC HEAD. Run the executable `properties` with `+RTS -c -DZ`. In addition to the reproducer above I run the test suite using: $ make slowtest EXTRA_HC_OPTS="-debug -with-rtsopts=-DS \ -with-rtsopts=-c +RTS -c -RTS" SKIPWAY='nonmoving nonmoving_thr' This enables compacting GC always in both GHC when building the test programs and when running the test programs, and also enables sanity checking when running the test programs. These set of flags are not compatible for all tests so there are some failures, but I got the same set of failures with this patch compared to GHC HEAD.
* Add a -no-haddock flag.Judah Jacobson2020-03-155-1/+26
| | | | | | | | This flag undoes the effect of a previous "-haddock" flag. Having both flags makes it easier for build systems to enable Haddock parsing in a set of global flags, but then disable it locally for specific targets (e.g., third-party packages whose comments don't pass the validation in the latest GHC). I added the flag to expected-undocumented-flags.txt since `-haddock` was alreadyin that list.
* Refactor CmmToAsm (disentangle DynFlags)Sylvain Henry2020-03-1542-1738/+2001
| | | | | | | | | | | | | | | | | | | | | This patch disentangles a bit more DynFlags from the native code generator (CmmToAsm). In more details: - add a new NCGConfig datatype in GHC.CmmToAsm.Config which contains the configuration of a native code generation session - explicitly pass NCGConfig/Platform arguments when necessary - as a consequence `sdocWithPlatform` is gone and there are only a few `sdocWithDynFlags` left - remove the use of `unsafeGlobalDynFlags` from GHC.CmmToAsm.CFG - remove `sdocDebugLevel` (now we pass the debug level via NCGConfig) There are still some places where DynFlags is used, especially because of pretty-printing (CLabel), because of Cmm helpers (such as `cmmExprType`) and because of `Outputable` instance for the instructions. These are left for future refactoring as this patch is already big.
* Note platform-specific Foreign.C.Types in contextViktor Dukhovni2020-03-151-4/+29
| | | | | Also fix the markup in the general note at the top of the module. Haddock (usability trade-off), does not support multi-line emphasised text.
* Remove some dead codeBrian Foley2020-03-159-100/+5
| | | | From the notes.ghc.drop list found using weeder in #17713
* Document restriction on SCC pragma syntaxKrzysztof Gogolewski2020-03-154-1/+10
| | | | | | | Currently, the names of cost centres must be quoted or be lowercase identifiers. Fixes #17916.
* Remove unused field var_inline (#17915)Krzysztof Gogolewski2020-03-145-44/+23
|
* rts: Fix printClosure when printing fwd ptrsÖmer Sinan Ağacan2020-03-141-1/+1
|
* Enable stage1 build of haddockCale Gibbard2020-03-141-1/+3
| | | | The submodule has already been bumped to contain the fix.
* Remove second tcLookupTcTyCon in tcDataDefnVladislav Zavialov2020-03-142-14/+14
| | | | | | | | | Before this patch, tcDataDefn used to call tcLookupTcTyCon twice in a row: 1. in bindTyClTyVars itself 2. in the continuation passed to it Now bindTyClTyVars passes the TcTyCon to the continuation, making the second lookup unnecessary.
* base: Make `open` calls interruptibleBen Gamari2020-03-142-2/+7
| | | | | | | As noted in #17912, `open` system calls were `safe` rather than `interruptible`. Consequently, the program could not be interrupted with SIGINT if stuck in a slow open operation. Fix this by marking `c_safe_open` as interruptible.
* Refactoring in TcSMonadSimon Peyton Jones2020-03-143-89/+78
| | | | | | | | | | | | | | | | | This patch is just refactoring: no change in behaviour. I removed the rather complicated checkConstraintsTcS checkTvConstraintsTcS in favour of simpler functions emitImplicationTcS emitTvImplicationTcS pushLevelNoWorkList The last of these is a little strange, but overall it's much better I think.
* Wrap an implication around class-sig kind errorsSimon Peyton Jones2020-03-148-18/+37
| | | | | | | | | | | | | | | | | | | | | Ticket #17841 showed that we can get a kind error in a class signature, but lack an enclosing implication that binds its skolems. This patch * Adds the wrapping implication: the new call to checkTvConstraints in tcClassDecl1 * Simplifies the API to checkTvConstraints, which was not otherwise called at all. * Simplifies TcErrors.report_unsolved by *not* initialising the TidyEnv from the typechecker lexical envt. It's enough to do so from the free vars of the unsolved constraints; and we get silly renamings if we add variables twice: once from the lexical scope and once from the implication constraint.
* gitlab-ci: Add integer-simple release build for WindowsBen Gamari2020-03-141-0/+7
| | | | Closes #16144.
* Improve CSE.combineAltsSimon Peyton Jones2020-03-146-25/+140
| | | | | | | | | | | | | This patch improves the way that CSE combines identical alternatives. See #17901. I'm still not happy about the duplication between CSE.combineAlts and GHC.Core.Utils.combineIdenticalAlts; see the Notes with those functions. But this patch is a step forward. Metric Decrease: T12425 T5642
* Simple refactor of cheapEqExprSimon Peyton Jones2020-03-141-16/+14
| | | | | No change in functionality. Just seems tidier (and signficantly more efficient) to deal with ticks directly than to call stripTicksTopE.
* nonmoving: Remove redundant bitmap clearingBen Gamari2020-03-141-2/+4
| | | | | nonmovingSweep already clears the bitmap in the sweep loop. There is no reason to do so a second time.
* nonmoving: Don't traverse filled segment list in pauseBen Gamari2020-03-142-18/+26
| | | | | | | | | The non-moving collector would previously walk the entire filled segment list during the preparatory pause. However, this is far more work than is strictly necessary. We can rather get away with merely collecting the allocators' filled segment list heads and process the lists themselves during the concurrent phase. This can significantly reduce the maximum gen1 GC pause time in programs with high rates of long-lived allocations.
* Use AC_ARG_VAR for LD_STAGE0 and AR_STAGE0.Judah Jacobson2020-03-141-1/+4
|
* Allow overriding LD_STAGE0 and AR_STAGE0 in the configure script.Judah Jacobson2020-03-141-3/+7
| | | | | | | | | Previously it was possible to override the stage0 C compiler via `CC_STAGE0`, but you couldn't override `ld` or `ar` in stage0. This change allows overriding them by setting `LD_STAGE0` or `AR_STAGE0`, respectively. Our team uses this feature internally to take more control of our GHC build and make it run more hermetically.
* Use correct option name (-opti) (fix #17314)Sylvain Henry2020-03-143-10/+6
| | | | s/pgmo/opti
* Make: refactor GMP rulesSylvain Henry2020-03-141-7/+8
| | | | Document and use simpler rules for the ghc-gmp.h header.
* Hadrian: fix absolute buildroot support (#17822)Sylvain Henry2020-03-145-24/+24
| | | | Shake's "**" wildcard doesn't match absolute root. We must use "//" instead.
* Fix LintSimon Peyton Jones2020-03-143-56/+147
| | | | | | | | | | | | | | Ticket #17590 pointed out a bug in the way the linter dealt with type lets, exposed by the new uniqAway story. The fix is described in Note [Linting type lets]. I ended up putting the in-scope Ids in a different env field, le_ids, rather than (as before) sneaking them into the TCvSubst. Surprisingly tiresome, but done. Metric Decrease: hie002
* gitlab-ci: Distinguish integer-simple test envsBen Gamari2020-03-131-1/+1
| | | Previously two integer-simple jobs declared the same test environment. One (the nightly job) was built in the perf way, the other in the validate way. Consequently they had appreciably different performance characteristics, causing in the nightly job to spuriously fail with performance changes.
* gitlab-ci: Rework triggering of release buildsBen Gamari2020-03-131-1/+2
| | | | Use a push option instead of tagging.
* Update documentation for closureSizePaavo2020-03-131-1/+2
|
* Rename isDllNameSylvain Henry2020-03-133-12/+9
| | | | | | I wanted to fix the dangling comment in `isDllName` ("This is the cause of #", #8696 is already mentioned earlier). I took the opportunity to change the function name to better reflect what it does.
* hadrian: improve dependency tracking for the check-* programsAlp Mestanogullari2020-03-133-34/+36
| | | | | | | | | | | | | | | | | | | | | | | The code in Rules.Register responsible for finding all the build artifacts that Cabal installs when registering a library (static/shared libs, .hi files, ...) was looking in the wrong place. This patch fixes that logic and makes sure we gather all those artifacts in a list to declare that the rule for a given `.conf` file, our proxy for "Hadrian, please install this package in the package db for this stage", also produces those artifacts under the said package database. We also were completely missing some logic to declare that the check-* programs have dependencies besides their source code, at least when testing an in-tree compiler. Finally, this patch also removes redundant packages from 'testsuitePackages', since they should already be covered by the stage<N>Packages lists from Settings.Default. With this patch, after a complete build and freezing stage 1, a change to `compiler/parser/Parser.y` results in rebuilding the ghc lib, reinstalling it, and rebuilding the few programs that depend on it, _including_ `check-ppr` and `check-api-annotations` (therefore fixing #17273).
* Expose compulsory unfoldings alwaysSimon Peyton Jones2020-03-126-33/+68
| | | | | | | | | | | | The unsafeCoerce# patch requires that unsafeCoerce# has a compulsory unfolding that is always available. So we have to be careful to expose compulsory unfoldings unconditionally and consistently. We didn't get this quite right: #17871. This patch fixes it. No real surprises here. See Note [Always expose compulsory unfoldings] in GHC.Iface.Tidy
* pretty-printer: Do not print ApplicativeDo joinKirill Elagin2020-03-124-4/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Do not print `join` in ApplictiveStmt, unless ppr-debug * Print parens around multiple parallel binds When ApplicativeDo is enabled, the renamer analyses the statements of a `do` block and in certain cases marks them as needing to be rewritten using `join`. For example, if you have: ``` foo = do a <- e1 b <- e2 doSomething a b ``` it will be desugared into: ``` foo = join (doSomething <$> e1 <*> e2) ``` After renaming but before desugaring the expression is stored essentially as: ``` foo = do [will need join] (a <- e1 | b <- e2) [no return] doSomething a b ``` Before this change, the pretty printer would print a call to `join`, even though it is not needed at this stage at all. The expression will be actually rewritten into one using join only at desugaring, at which point a literal call to join will be inserted.
* pretty-printer: Properly parenthesise LastStmtKirill Elagin2020-03-127-14/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | After ApplicatveDo strips the last `return` during renaming, the pretty printer has to restore it. However, if the return was followed by `$`, the dollar was stripped too and not restored. For example, the last stamement in: ``` foo = do x <- ... ... return $ f x ``` would be printed as: ``` return f x ``` This commit preserved the dolar, so it becomes: ``` return $ f x ```
* Make DeriveFunctor-generated code require fewer beta reductionsRyan Scott2020-03-119-125/+310
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue #17880 demonstrates that `DeriveFunctor`-generated code is surprisingly fragile when rank-_n_ types are involved. The culprit is that `$fmap` (the algorithm used to generate `fmap` implementations) was too keen on applying arguments with rank-_n_ types to lambdas, which fail to typecheck more often than not. In this patch, I change `$fmap` (both the specification and the implementation) to produce code that avoids creating as many lambdas, avoiding problems when rank-_n_ field types arise. See the comments titled "Functor instances" in `TcGenFunctor` for a more detailed description. Not only does this fix #17880, but it also ensures that the code that `DeriveFunctor` generates will continue to work after simplified subsumption is implemented (see #17775). What is truly amazing is that #17880 is actually a regression (introduced in GHC 7.6.3) caused by commit 49ca2a37bef18aa57235ff1dbbf1cc0434979b1e, the fix #7436. Prior to that commit, the version of `$fmap` that was used was almost identical to the one used in this patch! Why did that commit change `$fmap` then? It was to avoid severe performance issues that would arise for recursive `fmap` implementations, such as in the example below: ```hs data List a = Nil | Cons a (List a) deriving Functor -- ===> instance Functor List where fmap f Nil = Nil fmap f (Cons x xs) = Cons (f x) (fmap (\y -> f y) xs) ``` The fact that `\y -> f y` was eta expanded caused significant performance overheads. Commit 49ca2a37bef18aa57235ff1dbbf1cc0434979b1e fixed this performance issue, but it went too far. As a result, this patch partially reverts 49ca2a37bef18aa57235ff1dbbf1cc0434979b1e. To ensure that the performance issues pre-#7436 do not resurface, I have taken some precautionary measures: * I have added a special case to `$fmap` for situations where the last type variable in an application of some type occurs directly. If this special case fires, we avoid creating a lambda expression. This ensures that we generate `fmap f (Cons x xs) = Cons (f x) (fmap f xs)` in the derived `Functor List` instance above. For more details, see `Note [Avoid unnecessary eta expansion in derived fmap implementations]` in `TcGenFunctor`. * I have added a `T7436b` test case to ensure that the performance of this derived `Functor List`-style code does not regress. When implementing this, I discovered that `$replace`, the algorithm which generates implementations of `(<$)`, has a special case that is very similar to the `$fmap` special case described above. `$replace` marked this special case with a custom `Replacer` data type, which was a bit overkill. In order to use the same machinery for both `Functor` methods, I ripped out `Replacer` and instead implemented a simple way to detect the special case. See the updated commentary in `Note [Deriving <$]` for more details.
* Use a Set to represent WaysSylvain Henry2020-03-1112-52/+61
| | | | | | | | Should make `member` queries faster and avoid messing up with missing `nubSort`. Metric Increase: hie002