delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	rts/WSDeque: Rewrite with proper atomics	Ben Gamari	2020-12-01	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \|	After a few attempts at shoring up the previous implementation, I ended up turning to the literature and now use the proven implementation, > N.M. Lê, A. Pop, A.Cohen, and F.Z. Nardelli. "Correct and Efficient > Work-Stealing for Weak Memory Models". PPoPP'13, February 2013, > ACM 978-1-4503-1922/13/02. Note only is this approach formally proven correct under C11 semantics but it is also proved to be a bit faster in practice.
*	rts: Use relaxed ordering on spinlock counters	Ben Gamari	2020-12-01	1	-0/+1
\|
*	Capabiliity: Properly fix data race on n_returning_tasks	Ben Gamari	2020-12-01	1	-0/+1
\| \| \| \| \|	There is a real data race but can be made safe by using proper atomic (but relaxed) accesses.
*	SMP.h: Add C11-style atomic operations	Ben Gamari	2020-12-01	1	-1/+60
\|
*	Implement shrinkSmallMutableArray# and resizeSmallMutableArray#.	Andrew Martin	2019-10-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a part of GHC Proposal #25: "Offer more array resizing primitives". Resources related to the proposal: - Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121 - Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst Only shrinkSmallMutableArray# is implemented as a primop since a library-space implementation of resizeSmallMutableArray# (in GHC.Exts) is no less efficient than a primop would be. This may be replaced by a primop in the future if someone devises a strategy for growing arrays in-place. The library-space implementation always copies the array when growing it. This commit also tweaks the documentation of the deprecated sizeofMutableByteArray#, removing the mention of concurrency. That primop is unsound even in single-threaded applications. Additionally, the non-negativity assertion on the existing shrinkMutableByteArray# primop has been removed since this predicate is trivially always true.
*	Merge non-moving garbage collector	Ben Gamari	2019-10-23	3	-0/+33
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a concurrent mark & sweep garbage collector to manage the old generation. The concurrent nature of this collector typically results in significantly reduced maximum and mean pause times in applications with large working sets. Due to the large and intricate nature of the change I have opted to preserve the fully-buildable history, including merge commits, which is described in the "Branch overview" section below. Collector design ================ The full design of the collector implemented here is described in detail in a technical note > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell > Compiler" (2018) This document can be requested from @bgamari. The basic heap structure used in this design is heavily inspired by > K. Ueno & A. Ohori. "A fully concurrent garbage collector for > functional programs on multicore processors." /ACM SIGPLAN Notices/ > Vol. 51. No. 9 (presented at ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). Implementation structure ======================== The majority of the collector is implemented in a handful of files: * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point to the nonmoving collector (`nonmoving_collect`), as well as the allocator (`nonmoving_allocate`) and a number of utilities for manipulating the heap. * `rts/NonmovingMark.c` implements the mark queue functionality, update remembered set, and mark loop. * `rts/NonmovingSweep.c` implements the sweep loop. * `rts/NonmovingScav.c` implements the logic necessary to scavenge the nonmoving heap. Branch overview =============== ``` * wip/gc/opt-pause: \| A variety of small optimisations to further reduce pause times. \| * wip/gc/compact-nfdata: \| Introduce support for compact regions into the non-moving \|\ collector \| \ \| \ \| \| * wip/gc/segment-header-to-bdescr: \| \| \| Another optimization that we are considering, pushing \| \| \| some segment metadata into the segment descriptor for \| \| \| the sake of locality during mark \| \| \| \| * \| wip/gc/shortcutting: \| \| \| Support for indirection shortcutting and the selector optimization \| \| \| in the non-moving heap. \| \| \| * \| \| wip/gc/docs: \| \|/ Work on implementation documentation. \| / \|/ * wip/gc/everything: \| A roll-up of everything below. \|\ \| \ \| \|\ \| \| \ \| \| * wip/gc/optimize: \| \| \| A variety of optimizations, primarily to the mark loop. \| \| \| Some of these are microoptimizations but a few are quite \| \| \| significant. In particular, the prefetch patches have \| \| \| produced a nontrivial improvement in mark performance. \| \| \| \| \| * wip/gc/aging: \| \| \| Enable support for aging in major collections. \| \| \| \| * \| wip/gc/test: \| \| \| Fix up the testsuite to more or less pass. \| \| \| * \| \| wip/gc/instrumentation: \| \| \| A variety of runtime instrumentation including statistics \| \| / support, the nonmoving census, and eventlog support. \| \|/ \| / \|/ * wip/gc/nonmoving-concurrent: \| The concurrent write barriers. \| * wip/gc/nonmoving-nonconcurrent: \| The nonmoving collector without the write barriers necessary \| for concurrent collection. \| * wip/gc/preparation: \| A merge of the various preparatory patches that aren't directly \| implementing the GC. \| \| * GHC HEAD . . . ```
\| *	Fix unregisterised buildwip/gc/nonmoving-concurrent	Ben Gamari	2019-10-22	2	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \|	This required some fiddling around with the location of forward declarations since the C sources generated by GHC's C backend only includes Stg.h.
\| *	rts: Shrink size of STACK's dirty and marking fields	Ben Gamari	2019-10-20	1	-0/+19
\| \|
\| *	rts: Implement concurrent collection in the nonmoving collector	Ben Gamari	2019-10-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the first mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
* \|	Implement s390x LLVM backend.	Stefan Schulze Frielinghaus	2019-10-22	3	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for the s390x architecture for the LLVM code generator. The patch includes a register mapping of STG registers onto s390x machine registers which enables a registerised build.
* \|	Windows: Update tarballs to GCC 9.2 and remove MAX_PATH limit.	Tamar Christina	2019-10-20	1	-1/+8
\|/
*	Deduplicate `HaskellMachRegs.h` and `RtsMachRegs.h` headers	John Ericson	2019-09-17	2	-69/+3
\| \| \| \| \| \| \|	Until 0472f0f6a92395d478e9644c0dbd12948518099f there was a meaningful host vs target distinction (though it wasn't used right, in genapply). After that, they did not differ in meaningful ways, so it's best to just only keep one.
*	Module hierarchy: StgToCmm (#13009)	Sylvain Henry	2019-09-10	2	-6/+6
\| \| \| \| \| \|	Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
*	Remove most uses of TARGET platform macros	John Ericson	2019-07-09	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These prevent multi-target builds. They were gotten rid of in 3 ways: 1. In the compiler itself, replacing `#if` with runtime `if`. In these cases, we care about the target platform still, but the target platform is dynamic so we must delay the elimination to run time. 2. In the compiler itself, replacing `TARGET` with `HOST`. There was just one bit of this, in some code splitting strings representing lists of paths. These paths are used by GHC itself, and not by the compiled binary. (They are compiler lookup paths, rather than RPATHS or something that does matter to the compiled binary, and thus would legitamentally be target-sensative.) As such, the path-splitting method only depends on where GHC runs and not where code it produces runs. This should have been `HOST` all along. 3. Changing the RTS. The RTS doesn't care about the target platform, full stop. 4. `includes/stg/HaskellMachRegs.h` This file is also included in the genapply executable. This is tricky because the RTS's host platform really is that utility's target platform. so that utility really really isn't multi-target either. But at least it isn't an installed part of GHC, but just a one-off tool when building the RTS. Lying with the `HOST` to a one-off program (genapply) that isn't installed doesn't seem so bad. It's certainly better than the other way around of lying to the RTS though not to genapply. The RTS is more important, and it is installed, and this header is installed as part of the RTS.
*	Correct closure observation, construction, and mutation on weak memory machines.	Travis Whitaker	2019-06-28	1	-0/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
*	Add support for bitreverse primop	Alexandre	2019-04-01	1	-0/+8
\| \| \| \| \| \|	This commit includes the necessary changes in code and documentation to support a primop that reverses a word's bits. It also includes a test.
*	Update Wiki URLs to point to GitLab	Takenobu Tani	2019-03-25	10	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
*	Fix specification of load_load_barrier [skip-ci]	Peter Trommler	2019-03-20	1	-1/+1
\|
*	ghc-heap: Introduce closureSize	Ben Gamari	2019-03-17	1	-0/+1
\| \| \| \| \| \|	This function allows the user to compute the (non-transitive) size of a heap object in words. The "closure" in the name is admittedly confusing but we are stuck with this nomenclature at this point.
*	PPC NCG: Use liveness information in CmmCall	Peter Trommler	2019-03-15	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We make liveness information for global registers available on `JMP` and `BCTR`, which were the last instructions missing. With complete liveness information we do not need to reserve global registers in `freeReg` anymore. Moreover we assign R9 and R10 to callee saves registers. Cleanup by removing `Reg_Su`, which was unused, from `freeReg` and removing unused register definitions. The calculation of the number of floating point registers is too conservative. Just follow X86 and specify the constants directly. Overall on PowerPC this results in 0.3 % smaller code size in nofib while runtime is slightly better in some tests.
*	Documentation and refactoring in CCS related code	Ömer Sinan Ağacan	2019-01-12	1	-4/+0
\| \| \| \| \| \| \| \| \|	- Remove REGISTER_CC and REGISTER_CCS macros, add functions registerCC and registerCCS to Profiling.c. - Reduce scope of symbols: CC_LIST, CCS_LIST, CC_ID, CCS_ID - Document CC_LIST and CCS_LIST
*	PPC NCG: Remove Darwin support	Peter Trommler	2019-01-01	1	-16/+0
\| \| \| \| \| \| \|	Support for Mac OS X on PowerPC has been dropped by Apple years ago. We follow suit and remove PowerPC support for Darwin. Fixes #16106.
*	Finish stable split	David Feuer	2018-08-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Long ago, the stable name table and stable pointer tables were one. Now, they are separate, and have significantly different implementations. I believe the time has come to finish the split that began in #7674. * Divide `rts/Stable` into `rts/StableName` and `rts/StablePtr`. * Give each table its own mutex. * Add FFI functions `hs_lock_stable_ptr_table` and `hs_unlock_stable_ptr_table` and document them. These are intended to replace the previously undocumented `hs_lock_stable_tables` and `hs_lock_stable_tables`, which are now documented as deprecated synonyms. * Make `eqStableName#` use pointer equality instead of unnecessarily comparing stable name table indices. Reviewers: simonmar, bgamari, erikd Reviewed By: bgamari Subscribers: rwbarton, carter GHC Trac Issues: #15555 Differential Revision: https://phabricator.haskell.org/D5084
*	Add traceBinaryEvent# primop	Mitsutoshi Aoe	2018-08-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new primop called traceBinaryEvent# that takes the length of binary data and a pointer to the data, then emits it to the eventlog. There is some example code that uses this primop and the new event: * [traceBinaryEventIO][1] that calls `traceBinaryEvent#` * [A patch to ghc-events][2] that parses the new `EVENT_USER_BINARY_MSG` There's no corresponding issue on Trac but it was discussed at ghc-devs [3]. [1] https://github.com/maoe/ghc-trace-events/blob /fb226011ef1f85a97b4da7cc9d5f98f9fe6316ae/src/Debug/Trace/Binary.hs#L29) [2] https://github.com/maoe/ghc-events/commit /239ca77c24d18cdd10d6d85a0aef98e4a7c56ae6) [3] https://mail.haskell.org/pipermail/ghc-devs/2018-May/015791.html Reviewers: bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5007
*	Replace atomicModifyMutVar#	David Feuer	2018-07-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: simonmar, hvr, bgamari, erikd, fryguybob, rrnewton Reviewed By: simonmar Subscribers: fryguybob, rwbarton, thomie, carter GHC Trac Issues: #15364 Differential Revision: https://phabricator.haskell.org/D4884
*	Rename some mutable closure types for consistency	Ömer Sinan Ağacan	2018-06-05	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR, MUT_ARR_PTRS). (alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR) Removed a few comments in Scav.c about FROZEN0 being on the mut_list because it's now clear from the closure type. Reviewers: bgamari, simonmar, erikd Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4784
*	rts: Rip out support for STM invariants	Ben Gamari	2018-06-02	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This feature has some very serious correctness issues (#14310), introduces a great deal of complexity, and hasn't seen wide usage. Consequently we are removing it, as proposed in Proposal #77 [1]. This is heavily based on a patch from fryguybob. Updates stm submodule. [1] https://github.com/ghc-proposals/ghc-proposals/pull/77 Test Plan: Validate Reviewers: erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14310 Differential Revision: https://phabricator.haskell.org/D4760
*	An overhaul of the SRT representation	Simon Marlow	2018-05-16	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - Previously we would hvae a single big table of pointers per module, with a set of bitmaps to reference entries within it. The new representation is identical to a static constructor, which is much simpler for the GC to traverse, and we get to remove the complicated bitmap-traversal code from the GC. - Rewrite all the code to generate SRTs in CmmBuildInfoTables, and document it much better (see Note [SRTs]). This has been something I've wanted to do since we moved to the new code generator, I finally had the opportunity to finish it while on a transatlantic flight recently :) There are a series of 4 diffs: 1. D4632 (this one), which does the bulk of the changes 2. D4633 which adds support for smaller `CmmLabelDiffOff` constants 3. D4634 which takes advantage of D4632 and D4633 to save a word in info tables that have an SRT on x86_64. This is where most of the binary size improvement comes from. 4. D4637 which makes a further optimisation to merge some SRTs with static FUN closures. This adds some complexity and the benefits are fairly modest, so it's not clear yet whether we should do this. Results (after (3), on x86_64) - GHC itself (staticaly linked) is 5.2% smaller - -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176 - I measured the overhead of traversing all the static objects in a major GC in GHC itself by doing `replicateM_ 1000 performGC` as the first thing in `Main.main`. The new version was 5-10% faster, but the results did vary quite a bit. - I'm not sure if there's a compile-time difference, the results are too unreliable. Test Plan: validate Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4632
*	Improve accuracy of get/setAllocationCounter	Ben Gamari	2018-03-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: get/setAllocationCounter didn't take into account allocations in the current block. This was known at the time, but it turns out to be important to have more accuracy when using these in a fine-grained way. Test Plan: New unit test to test incrementally larger allocaitons. Before I got results like this: ``` +0 +0 +0 +0 +0 +4096 +0 +0 +0 +0 +0 +4064 +0 +0 +4088 +4056 +0 +0 +0 +4088 +4096 +4056 +4096 ``` Notice how the results aren't always monotonically increasing. After this patch: ``` +344 +416 +488 +560 +632 +704 +776 +848 +920 +992 +1064 +1136 +1208 +1280 +1352 +1424 +1496 +1568 +1640 +1712 +1784 +1856 +1928 +2000 +2072 +2144 ``` Reviewers: hvr, erikd, simonmar, jrtc27, trommler Reviewed By: simonmar Subscribers: trommler, jrtc27, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4363
*	UNREG: fix implicit declarations from pdep and pext	Sergei Trofimovich	2018-03-09	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unreg build failed as: $ ./configure --enable-unregisterised $ make HC [stage 1] libraries/ghc-prim/dist-install/build/GHC/PrimopWrappers.o ghc_1.hc: In function 'ghczmprim_GHCziPrimopWrappers_pdep8zh_entry': ghc_1.hc:1810:9: error: error: implicit declaration of function 'hs_pdep8'; did you mean 'hs_ctz8'? [-Werror=implicit-function-declaration] _c3jz = hs_pdep8(Sp, Sp[1]); ^~~~~~~~ hs_ctz8 \| 1810 \| _c3jz = hs_pdep8(Sp, Sp[1]); \| ^ Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
*	Revert "Improve accuracy of get/setAllocationCounter"	Ben Gamari	2018-01-18	1	-3/+0
\| \| \| \|	This reverts commit a1a689dda48113f3735834350fb562bb1927a633.
*	Improve accuracy of get/setAllocationCounter	Simon Marlow	2018-01-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: get/setAllocationCounter didn't take into account allocations in the current block. This was known at the time, but it turns out to be important to have more accuracy when using these in a fine-grained way. Test Plan: New unit test to test incrementally larger allocaitons. Before I got results like this: ``` +0 +0 +0 +0 +0 +4096 +0 +0 +0 +0 +0 +4064 +0 +0 +4088 +4056 +0 +0 +0 +4088 +4096 +4056 +4096 ``` Notice how the results aren't always monotonically increasing. After this patch: ``` +344 +416 +488 +560 +632 +704 +776 +848 +920 +992 +1064 +1136 +1208 +1280 +1352 +1424 +1496 +1568 +1640 +1712 +1784 +1856 +1928 +2000 +2072 +2144 ``` Reviewers: niteria, bgamari, hvr, erikd Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4288
*	Remove left-overs from compareByteArray# inline conversion	Herbert Valerio Riedel	2017-11-09	1	-1/+0
\| \| \| \| \| \| \|	These removes left-overs from e3ba26f8b49700b41ff4672f3f7f6a4e453acdcc where I implemented `compareByteArray#` as an out-of-line primop, which got optimised into an inline primop shortly afterwards (as per 7673561555ae354fd9eed8de1e57c681906e2d49).
*	Allow packing constructor fields	Michal Terepeta	2017-10-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is another step for fixing #13825 and is based on D38 by Simon Marlow. The change allows storing multiple constructor fields within the same word. This currently applies only to `Float`s, e.g., ``` data Foo = Foo {-# UNPACK #-} !Float {-# UNPACK #-} !Float ``` on 64-bit arch, will now store both fields within the same constructor word. For `WordX/IntX` we'll need to introduce new primop types. Main changes: - We now use sizes in bytes when we compute the offsets for constructor fields in `StgCmmLayout` and introduce padding if necessary (word-sized fields are still word-aligned) - `ByteCodeGen` had to be updated to correctly construct the data types. This required some new bytecode instructions to allow pushing things that are not full words onto the stack (and updating `Interpreter.c`). Note that we only use the packed stuff when constructing data types (i.e., for `PACK`), in all other cases the behavior should not change. - `RtClosureInspect` was changed to handle the new layout when extracting subterms. This seems to be used by things like `:print`. I've also added a test for this. - I deviated slightly from Simon's approach and use `PrimRep` instead of `ArgRep` for computing the size of fields. This seemed more natural and in the future we'll probably want to introduce new primitive types (e.g., `Int8#`) and `PrimRep` seems like a better place to do that (where we already have `Int64Rep` for example). `ArgRep` on the other hand seems to be more focused on calling functions. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar, austin, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: maoe, rwbarton, thomie GHC Trac Issues: #13825 Differential Revision: https://phabricator.haskell.org/D3809
*	Implement new `compareByteArrays#` primop	Herbert Valerio Riedel	2017-10-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new primop compareByteArrays# :: ByteArray# -> Int# {- offset -} -> ByteArray# -> Int# {- offset -} -> Int# {- length -} -> Int# allows to compare the subrange of the first `ByteArray#` to the (same-length) subrange of the second `ByteArray#` and returns a value less than, equal to, or greater than zero if the range is found, respectively, to be byte-wise lexicographically less than, to match, or be greater than the second range. Under the hood, the new primop is implemented in terms of the standard ISO C `memcmp(3)` function. It is currently an out-of-line primop but work is underway to optimise this into an inline primop for a future follow-up Differential (see D4091). This primop has applications in packages like `text`, `text-short`, `bytestring`, `text-containers`, `primitive`, etc. which currently have to incur the overhead of an ordinary FFI call to directly or indirectly invoke `memcmp(3)` as well has having to deal with some `unsafePerformIO`-variant. While at it, this also improves the documentation for the existing `copyByteArray#` primitive which has a non-trivial type-signature that significantly benefits from a more explicit description of its arguments. Reviewed By: bgamari Differential Revision: https://phabricator.haskell.org/D4090
*	Typos in comments [ci skip]	Gabor Greif	2017-07-06	1	-1/+1
\|
*	Fix a lost-wakeup bug in BLACKHOLE handling (#13751)	Simon Marlow	2017-06-08	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The problem occurred when * Threads A & B evaluate the same thunk * Thread A context-switches, so the thunk gets blackholed * Thread C enters the blackhole, creates a BLOCKING_QUEUE attached to the blackhole and thread A's `tso->bq` queue * Thread B updates the blackhole with a value, overwriting the BLOCKING_QUEUE * We GC, replacing A's update frame with stg_enter_checkbh * Throw an exception in A, which ignores the stg_enter_checkbh frame Now we have C blocked on A's tso->bq queue, but we forgot to check the queue because the stg_enter_checkbh frame has been thrown away by the exception. The solution and alternative designs are discussed in Note [upd-black-hole]. This also exposed a bug in the interpreter, whereby we were sometimes context-switching without calling `threadPaused()`. I've fixed this and added some Notes. Test Plan: * `cd testsuite/tests/concurrent && make slow` * validate Reviewers: niteria, bgamari, austin, erikd Reviewed By: erikd Subscribers: rwbarton, thomie GHC Trac Issues: #13751 Differential Revision: https://phabricator.haskell.org/D3630
*	Prefer #if defined to #ifdef	Ben Gamari	2017-04-28	6	-33/+33
\| \| \| \|	Our new CPP linter enforces this.
*	Enable new warning for fragile/incorrect CPP #if usage	Erik de Castro Lopo	2017-04-28	4	-40/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
*	compiler/cmm/PprC.hs: constify labels in .rodata	Sergei Trofimovich	2017-04-24	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider one-line module module B (v) where v = "hello" in -fvia-C mode it generates code like static char gibberish_str[] = "hello"; It resides in data section (precious resource on ia64!). The patch switches genrator to emit: static const char gibberish_str[] = "hello"; Other types if symbols that gained 'const' qualifier are: - info tables (from haskell and CMM) - static reference tables (from haskell and CMM) Cleanups along the way: - fixed info tables defined in .cmm to reside in .rodata - split out closure declaration into 'IC_' / 'EC_' - added label declaration (based on label type) right before each label definition (based on section type) so that C compiler could check if declaration and definition matches at definition site. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Test Plan: ran testsuite on unregisterised x86_64 compiler Reviewers: simonmar, ezyang, austin, bgamari, erikd Reviewed By: bgamari, erikd Subscribers: rwbarton, thomie GHC Trac Issues: #8996 Differential Revision: https://phabricator.haskell.org/D3481
*	cpp: Use #pragma once instead of #ifndef guards	Ben Gamari	2017-04-23	10	-41/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
*	hs_add_root() RTS API removal	Sergei Trofimovich	2017-04-17	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before ghc-7.2 hs_add_root() had to be used to initialize haskell modules when haskell was called from FFI. commit a52ff7619e8b7d74a9d933d922eeea49f580bca8 ("Change the way module initialisation is done (#3252, #4417)") removed needs for hs_add_root() and made function a no-op. For backward compatibility '__stginit_<module>' symbol was not removed. This change removes no-op hs_add_root() function and unused '__stginit_<module>' symbol from each haskell module. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Test Plan: ./validate Reviewers: simonmar, austin, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3460
*	Revert "Enable new warning for fragile/incorrect CPP #if usage"	Ben Gamari	2017-04-05	4	-83/+40
\| \| \| \| \| \| \| \|	This is causing too much platform dependent breakage at the moment. We will need a more rigorous testing strategy before this can be merged again. This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
*	Enable new warning for fragile/incorrect CPP #if usage	Erik de Castro Lopo	2017-04-05	4	-40/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
*	Fix comment (old file names) in includes/	Takenobu Tani	2017-02-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[skip ci] There ware some old file names (.lhs, ...) at comments. * includes/rts/Bytecodes.h - ghc/compiler/ghci/ByteCodeGen.lhs -> ByteCodeAsm.hs * includes/rts/Constants.h - libraries/base/GHC/Conc.lhs -> libraries/base/GHC/Conc/Sync.hs * includes/rts/storage/FunTypes.h - utils/genapply/GenApply.hs -> utils/genappl/Main.hs - compiler/codeGen/CgCallConv.lhs -> compiler/codeGen/StgCmmLayout.hs * includes/stg/MiscClosures.h - compiler/codeGen/CgStackery.lhs -> compiler/codeGen/StgCmmArgRep.hs - HeapStackCheck.hc -> HeapStackCheck.cmm Reviewers: bgamari, austin, simonmar, erikd Reviewed By: erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3074
*	Spelling fixes in comments [ci skip]	Gabor Greif	2017-01-18	1	-2/+2
\|
*	More fixes for #5654	Simon Marlow	2017-01-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* In stg_ap_0_fast, if we're evaluating a thunk, the thunk might evaluate to a function in which case we may have to adjust its CCS. * The interpreter has its own implementation of stg_ap_0_fast, so we have to do the same shenanigans with creating empty PAPs and copying PAPs there. * GHCi creates Cost Centres as children of CCS_MAIN, which enterFunCCS() wrongly assumed to imply that they were CAFs. Now we use the is_caf flag for this, which we have to correctly initialise when we create a Cost Centre in GHCi.
*	UNREG: include CCS_OVERHEAD to STG	Sergei Trofimovich	2016-12-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 394231b301efb6b56654b0a480ab794fe3b7e4db aded CCS_OVERHEAD annotation to 'rts/Apply.cmm'. Before the change CCS_OVERHEAD was used only in C code. The change exports CCS_OVERHEAD to STG. Fixes UNREG build failure: rts_dist_HC rts/dist/build/Apply.p_o /tmp/ghc29563_0/ghc_4.hc: In function 'cm_entry': /tmp/ghc29563_0/ghc_4.hc:73:13: error: error: 'CCS_OVERHEAD' undeclared (first use in this function) *((P_)((W_)&CCS_OVERHEAD+72)) = ... ^~~~~~~~~~~~ Signed-off-by: Sergei Trofimovich <siarheit@google.com>
*	Overhaul of Compact Regions (#12455)	Simon Marlow	2016-12-07	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This commit makes various improvements and addresses some issues with Compact Regions (aka Compact Normal Forms). This was the most important thing I wanted to fix. Compaction previously prevented GC from running until it was complete, which would be a problem in a multicore setting. Now, we compact using a hand-written Cmm routine that can be interrupted at any point. When a GC is triggered during a sharing-enabled compaction, the GC has to traverse and update the hash table, so this hash table is now stored in the StgCompactNFData object. Previously, compaction consisted of a deepseq using the NFData class, followed by a traversal in C code to copy the data. This is now done in a single pass with hand-written Cmm (see rts/Compact.cmm). We no longer use the NFData instances, instead the Cmm routine evaluates components directly as it compacts. The new compaction is about 50% faster than the old one with no sharing, and a little faster on average with sharing (the cost of the hash table dominates when we're doing sharing). Static objects that don't (transitively) refer to any CAFs don't need to be copied into the compact region. In particular this means we often avoid copying Char values and small Int values, because these are static closures in the runtime. Each Compact# object can support a single compactAdd# operation at any given time, so the Data.Compact library now enforces mutual exclusion using an MVar stored in the Compact object. We now get exceptions rather than killing everything with a barf() when we encounter an object that cannot be compacted (a function, or a mutable object). We now also detect pinned objects, which can't be compacted either. The Data.Compact API has been refactored and cleaned up. A new compactSize operation returns the size (in bytes) of the compact object. Most of the documentation is in the Haddock docs for the compact library, which I've expanded and improved here. Various comments in the code have been improved, especially the main Note [Compact Normal Forms] in rts/sm/CNF.c. I've added a few tests, and expanded a few of the tests that were there. We now also run the tests with GHCi, and in a new test way that enables sanity checking (+RTS -DS). There's a benchmark in libraries/compact/tests/compact_bench.hs for measuring compaction speed and comparing sharing vs. no sharing. The field totalDataW in StgCompactNFData was unnecessary. Test Plan: * new unit tests * validate * tested manually that we can compact Data.Aeson data Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd Subscribers: thomie, simonpj Differential Revision: https://phabricator.haskell.org/D2751 GHC Trac Issues: #12455
*	Make start address of `osReserveHeapMemory` tunable via command line -xb	Francesco Mazzoli	2016-09-09	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We stumbled upon a case where an external library (OpenCL) does not work if a specific address (0x200000000) is taken. It so happens that `osReserveHeapMemory` starts trying to mmap at 0x200000000: ``` void hint = (void)((W_)8 * (1 << 30) + attempt * BLOCK_SIZE); at = osTryReserveHeapMemory(*len, hint); ``` This makes it impossible to use Haskell programs compiled with GHC 8 with C functions that use OpenCL. See this example https://github.com/chpatrick/oclwtf for a repro. This patch allows the user to work around this kind of behavior outside our control by letting the user override the starting address through an RTS command line flag. Reviewers: bgamari, Phyx, simonmar, erikd, austin Reviewed By: Phyx, simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D2513