summaryrefslogtreecommitdiff
path: root/rts
Commit message (Collapse)AuthorAgeFilesLines
* Lookup _GLOBAL_OFFSET_TABLE by symbol->addr when doing relocationsEdward Amsden2019-07-021-1/+1
|
* Add _GLOBAL_OFFSET_TABLE_ supportMoritz Angermann2019-07-022-5/+38
| | | | | | | This adds lookup logic for _GLOBAL_OFFSET_TABLE_ as well as relocation logic for R_ARM_BASE_PREL and R_ARM_GOT_BREL which the gnu toolchain (gas, gcc, ...) prefers to produce. Apparently recent llvm toolchains will produce those as well.
* rts: Assert that LDV profiling isn't used with parallel GCwip/memory-barriersBen Gamari2019-06-281-0/+3
| | | | | I'm not entirely sure we are careful about ensuring this; this is a last-ditch check.
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-2821-48/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Fix GCC warnings with __clear_cache builtin (#16867)Sylvain Henry2019-06-271-6/+8
|
* rts: Do not traverse nursery for dead closures in LDV profileMatthew Pickering2019-06-271-23/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | It is important that `heapCensus` and `LdvCensusForDead` traverse the same areas. `heapCensus` increases the `not_used` counter which tracks how many closures are live but haven't been used yet. `LdvCensusForDead` increases the `void_total` counter which tracks how many dead closures there are. The `LAG` is then calculated by substracting the `void_total` from `not_used` and so it is essential that `not_used >= void_total`. This fact is checked by quite a few assertions. However, if a program has low maximum residency but allocates a lot in the nursery then these assertions were failing (see #16753 and #15903) because `LdvCensusForDead` was observing dead closures from the nursery which totalled more than the `not_used`. The same closures were not counted by `heapCensus`. Therefore, it seems that the correct fix is to make `LdvCensusForDead` agree with `heapCensus` and not traverse the nursery for dead closures. Fixes #16100 #16753 #15903 #8982
* rts: Correct assertion in LDV_recordDeadMatthew Pickering2019-06-271-1/+1
| | | | | It is possible that void_total is exactly equal to not_used and the other assertions for this check for <= rather than <.
* rts: Correct handling of LARGE ARR_WORDS in LDV profilerMatthew Pickering2019-06-271-13/+2
| | | | | | | | | This implements the correct fix for #11627 by skipping over the slop (which is zeroed) rather than adding special case logic for LARGE ARR_WORDS which runs the risk of not performing a correct census by ignoring any subsequent blocks. This approach implements similar logic to that in Sanity.c
* [skip ci] add a blurb about the purpose of Printer.cSiddharth Bhat2019-06-261-1/+2
|
* rts: Reset STATIC_LINK field of reverted CAFsBen Gamari2019-06-221-6/+11
| | | | | | | | | | | | When we revert a CAF we must reset the STATIC_LINK field lest the GC might ignore the CAF (e.g. as it carries the STATIC_FLAG_LIST flag) and will consequently overlook references to object code that we are trying to unload. This would result in the reachable object code being unloaded. See Note [CAF lists] and Note [STATIC_LINK fields]. This fixes #16842. Idea-due-to: Phuong Trinh <lolotp@fb.com>
* Fix #16525: ObjectCode freed wrongly because of lack of info header checkPhuong Trinh2019-06-133-1/+8
| | | | | | | `checkUnload` currently doesn't check the info header of static objects. Thus, it may free an `ObjectCode` struct wrongly even if there's still a live static object whose info header lies in a mapped section of that `ObjectCode`. This fixes the issue by adding an appropriate check.
* rts/linker: Only mprotect GOT after it is filledBen Gamari2019-06-121-2/+5
| | | | | | | | This fixes a regression, introduced by 67c422ca, where we mprotect'd the global offset table (GOT) region to PROT_READ before we had finished filling it, resulting in a linker crash. Fixes #16779.
* rts/linker: Make elf_got.c a bit more legibleBen Gamari2019-06-121-1/+10
|
* Fix an error message in CheckUnload.c:searchHeapBlocksÖmer Sinan Ağacan2019-06-111-1/+1
|
* rts/linker: Use mmapForLinker to map PLTBen Gamari2019-06-111-6/+2
| | | | | | | The PLT needs to be located within a close distance of the code calling it under the small memory model. Fixes #16784.
* rts/linker: Mmap into low memory on AArch64Ben Gamari2019-06-111-13/+22
| | | | | This extends mmapForLinker to use the same low-memory mapping strategy used on x86_64 on AArch64. See #16784.
* rts/RtsFlags.c: mention that -prof too enables support for +RTS -lAlp Mestanogullari2019-06-111-1/+1
|
* rts: Fix RetainerProfile early return with TREC_CHUNKDaniel Gröber2019-06-091-1/+1
| | | | | | | | When pop() returns with `*c == NULL` retainerProfile will immediately return. All other code paths is pop() continue with the next stackElement when this happens so it seems weird to me that TREC_CHUNK we would suddenly abort everything even though the stack might still have elements left to process.
* rts: Separate population of eventTypes from initial event generationBen Gamari2019-06-091-8/+17
| | | | | | Previously these two orthogonal concerns were both implemented in postHeaderEvents which made it difficult to send header events after RTS initialization.
* Fix two lint failures in rts/linker/MachO.cMatthew Pickering2019-06-081-2/+2
|
* Add HEAP_PROF_SAMPLE_END event to mark end of samplesMatthew Pickering2019-06-075-0/+25
| | | | | | | This allows a user to observe how long a sampling period lasts so that the time taken can be removed from the profiling output. Fixes #16697
* rts: Remove unused decls from CNF.hÖmer Sinan Ağacan2019-06-011-3/+0
|
* Remove unused RTS function 'unmark'Ömer Sinan Ağacan2019-05-311-10/+0
|
* support small arrays and CONSTR_NOCAF in ghc-heapDavid Hewson2019-05-311-0/+11
|
* Apply suggestion to rts/CheckUnload.cTrịnh Tuấn Phương2019-05-301-1/+2
|
* Apply suggestion to rts/CheckUnload.cTrịnh Tuấn Phương2019-05-301-1/+1
|
* Use binary search to speedup checkUnloadPhuong Trinh2019-05-301-32/+135
| | | | | | | | | We are iterating through all object code for each heap objects when checking whether object code can be unloaded. For large projects in GHCi, this can be very expensive due to the large number of object code that needs to be loaded/unloaded. To speed it up, this arrangess all mapped sections of unloaded object code in a sorted array and use binary search to check if an address location fall on them.
* rts: Handle zero-sized mappings in MachO linkerBen Gamari2019-05-301-2/+6
| | | | | | | As noted in #16701, it is possible that we will find that an object has no segments needing to be mapped. Previously this would result in mmap being called for a zero-length mapping, which would fail. We now simply skip the mmap call in this case; the rest of the logic just works.
* CNF.c: Move debug functions behind ifdefÖmer Sinan Ağacan2019-05-291-1/+1
|
* Fix padding of entries in .prof filesJasper Van der Jeugt2019-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | When the number of entries of a cost centre reaches 11 digits, it takes up the whole space reserved for it and the prof file ends up looking like: ... no. entries %time %alloc %time %alloc ... ... 120918 978250 0.0 0.0 0.0 0.0 ... 118891 0 0.0 0.0 73.3 80.8 ... 11890229702412351 8.9 13.5 73.3 80.8 ... 118903 153799689 0.0 0.1 0.0 0.1 ... This results in tooling not being able to parse the .prof file. I realise we have the JSON output as well now, but still it'd be good to fix this little weirdness. Original bug report and full prof file can be seen here: <https://github.com/jaspervdj/profiteur/issues/28>.
* Add `keepCAFs` to RtsSymbolsMoritz Angermann2019-05-251-0/+1
|
* RTS: Fix restrictive castAlec Theriault2019-05-221-2/+2
| | | | | | | | | | | | | Commit e75a9afd2989e0460f9b49fa07c1667299d93ee9 added an `unsigned` cast to account for OSes that have signed `rlim_t` signed. Unfortunately, the `unsigned` cast has the unintended effect of narrowing `rlim_t` to only 4 bytes. This leads to some spurious out of memory crashes (in particular: Haddock crashes with OOM whenn building docs of `ghc`-the-library). In this case, `W_` is a better type to cast to: we know it will be unsigned too and it has the same type as `*len` (so we don't suffer from accidental narrowing).
* Print PAP object address in stg_PAP_info entry codeÖmer Sinan Ağacan2019-05-081-1/+1
| | | | Continuation to ce23451c
* PrimOps.cmm: remove unused stuffÖmer Sinan Ağacan2019-05-031-6/+2
|
* rts: Properly free the RTSSummaryStats structureÖmer Sinan Ağacan2019-05-031-6/+4
| | | | | | | | `stat_exit` always allocates a `RTSSummaryStats` but only sometimes frees it, which casues leaks. With this patch we unconditionally free the structure, fixing the leak. Fixes #16584
* Minor RTS refactoring:Ömer Sinan Ağacan2019-04-252-2/+2
| | | | | | - Remove redundant casting in evacuate_static_object - Remove redundant parens in STATIC_LINK - Fix a typo in GC.c
* osReserveHeapMemory: handle signed rlim_tFraser Tweedale2019-04-231-2/+4
| | | | | | rlim_t is a signed type on FreeBSD, and the build fails with a sign-compare error. Add explicit (unsigned) cast to handle this case.
* Restore Xmm registers properly in StgCRun.cklebinger.andreas@gmx.at2019-04-041-9/+9
| | | | | This fixes #16514: Xmm6-15 was restored based off rax instead of rsp. The code was introduced in the fix for #14619.
* Improve performance of newSmallArray#Michal Terepeta2019-04-012-4/+7
| | | | | | | | | | | | | | This: - Hoists part of the condition outside of the initialization loop in `stg_newSmallArrayzh`. - Annotates one of the unlikely branches as unlikely, also in `stg_newSmallArrayzh`. - Adds a couple of annotations to `allocateMightFail` indicating which branches are likely to be taken. Together this gives about 5% improvement. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-2530-34/+34
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* Gracefully handle error condition in Mach-O relocateSectionArtem Pyanykh2019-03-201-1/+6
|
* Directly test section alignment, fix internal reloc probing lengthArtem Pyanykh2019-03-201-2/+6
|
* Add missing levels to SegmentProt enumArtem Pyanykh2019-03-202-4/+19
|
* Address some todos and fixmesArtem Pyanykh2019-03-203-21/+22
|
* Use segments for section layoutArtem Pyanykh2019-03-206-94/+296
|
* Adjust section placement and relocation logic for Mach-OArtem Pyanykh2019-03-204-123/+234
| | | | | | | | | | | | | | 1. Place each section on a separate page to ensure required alignment (wastes lots ot space, needs to be improved). 2. Unwire relocation logic from macho sections (the most fiddly part is adjusting internal relocations). Other todos: 0. Add a test for section alignment. 1. Investigate 32bit relocations! 2. Fix memory leak in ZEROPAGE section allocation. 3. Fix creating redundant jump islands for GOT. 4. Investigate more compact section placement.
* rts/RtsSymbols: Drop __mingw_vsnwprintfBen Gamari2019-03-201-1/+0
| | | | | | As described in #16387, this is already defined by mingw and consequently defining it in the RTS as well leads to multiple definition errors from the RTS linker at runtime.
* err: clean up error handlerTamar Christina2019-03-191-19/+30
|
* ghc-heap: Introduce closureSizeBen Gamari2019-03-172-0/+8
| | | | | | This function allows the user to compute the (non-transitive) size of a heap object in words. The "closure" in the name is admittedly confusing but we are stuck with this nomenclature at this point.
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-1514-18/+18
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.