diff options
author | Ömer Sinan Ağacan <omeragacan@gmail.com> | 2018-09-06 15:52:53 +0300 |
---|---|---|
committer | Ömer Sinan Ağacan <omeragacan@gmail.com> | 2018-09-06 15:53:03 +0300 |
commit | c6fbac6a6a69a2f4be89701b2c386ae53214f9a3 (patch) | |
tree | 5b1084c9a64221971f9e2bd0e30760214e492719 /rts/sm | |
parent | 16bc7ae8b191153071b5fd1dde2b02e51171860e (diff) | |
download | haskell-c6fbac6a6a69a2f4be89701b2c386ae53214f9a3.tar.gz |
Fix a race between GC threads in concurrent scavenging
While debugging #15285 I realized that free block lists (free_list in
BlockAlloc.c) get corrupted when multiple scavenge threads allocate and
release blocks concurrently. Here's a picture of one such race:
Thread 2 (Thread 32573.32601):
#0 check_tail
(bd=0x940d40 <stg_TSO_info>) at rts/sm/BlockAlloc.c:860
#1 0x0000000000928ef7 in checkFreeListSanity
() at rts/sm/BlockAlloc.c:896
#2 0x0000000000928979 in freeGroup
(p=0x7e998ce02880) at rts/sm/BlockAlloc.c:721
#3 0x0000000000928a17 in freeChain
(bd=0x7e998ce02880) at rts/sm/BlockAlloc.c:738
#4 0x0000000000926911 in freeChain_sync
(bd=0x7e998ce02880) at rts/sm/GCUtils.c:80
#5 0x0000000000934720 in scavenge_capability_mut_lists
(cap=0x1acae80) at rts/sm/Scav.c:1665
#6 0x000000000092b411 in gcWorkerThread
(cap=0x1acae80) at rts/sm/GC.c:1157
#7 0x000000000090be9a in yieldCapability
(pCap=0x7f9994e69e20, task=0x7e9984000b70, gcAllowed=true) at rts/Capability.c:861
#8 0x0000000000906120 in scheduleYield
(pcap=0x7f9994e69e50, task=0x7e9984000b70) at rts/Schedule.c:673
#9 0x0000000000905500 in schedule
(initialCapability=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:293
#10 0x0000000000908d4f in scheduleWorker
(cap=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:2554
#11 0x000000000091a30a in workerStart
(task=0x7e9984000b70) at rts/Task.c:444
#12 0x00007f99937fa6db in start_thread
(arg=0x7f9994e6a700) at pthread_create.c:463
#13 0x000061654d59f88f in clone
() at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 32573.32573):
#0 checkFreeListSanity
() at rts/sm/BlockAlloc.c:887
#1 0x0000000000928979 in freeGroup
(p=0x7e998d303540) at rts/sm/BlockAlloc.c:721
#2 0x0000000000926f23 in todo_block_full
(size=513, ws=0x1aa8ce0) at rts/sm/GCUtils.c:264
#3 0x00000000009583b9 in alloc_for_copy
(size=513, gen_no=0) at rts/sm/Evac.c:80
#4 0x000000000095850d in copy_tag_nolock
(p=0x7e998c675f28, info=0x421d98 <Main_Large_con_info>, src=0x7e998d075d80, size=513,
gen_no=0, tag=1) at rts/sm/Evac.c:153
#5 0x0000000000959177 in evacuate
(p=0x7e998c675f28) at rts/sm/Evac.c:715
#6 0x0000000000932388 in scavenge_small_bitmap
(p=0x7e998c675f28, size=1, bitmap=0) at rts/sm/Scav.c:271
#7 0x0000000000934aaf in scavenge_stack
(p=0x7e998c675f28, stack_end=0x7e998c676000) at rts/sm/Scav.c:1908
#8 0x0000000000934295 in scavenge_one
(p=0x7e998c66e000) at rts/sm/Scav.c:1466
#9 0x0000000000934662 in scavenge_mutable_list
(bd=0x7e998d300440, gen=0x1b1d880) at rts/sm/Scav.c:1643
#10 0x0000000000934700 in scavenge_capability_mut_lists
(cap=0x1aaa340) at rts/sm/Scav.c:1664
#11 0x00000000009299b6 in GarbageCollect
(collect_gen=0, do_heap_census=false, gc_type=2, cap=0x1aaa340, idle_cap=0x1b38aa0)
at rts/sm/GC.c:378
#12 0x0000000000907a4a in scheduleDoGC
(pcap=0x7ffdec5b5310, task=0x1b36650, force_major=false) at rts/Schedule.c:1798
#13 0x0000000000905de7 in schedule
(initialCapability=0x1aaa340, task=0x1b36650) at rts/Schedule.c:546
#14 0x0000000000908bc4 in scheduleWaitThread
(tso=0x7e998c0067c8, ret=0x0, pcap=0x7ffdec5b5430) at rts/Schedule.c:2537
#15 0x000000000091b5a0 in rts_evalLazyIO
(cap=0x7ffdec5b5430, p=0x9c11f0, ret=0x0) at rts/RtsAPI.c:530
#16 0x000000000091ca56 in hs_main
(argc=1, argv=0x7ffdec5b5628, main_closure=0x9c11f0, rts_config=...) at rts/RtsMain.c:72
#17 0x0000000000421ea0 in main
()
In particular, dbl_link_onto() which is used to add a freed block to a
doubly-linked free list is not thread safe and corrupts the list when
called concurrently.
Note that thread 1 is to blame here as thread 2 is properly taking the
spinlock. With this patch we now take the spinlock when freeing a todo
block in GC, avoiding this race.
Test Plan:
- Tried slow validate locally: this patch does not introduce new failures.
- circleci: https://circleci.com/gh/ghc/ghc-diffs/283 The test got killed
because it took 5 hours but T7919 (which was previously failing on circleci)
passed.
Reviewers: simonmar, bgamari, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
GHC Trac Issues: #15285
Differential Revision: https://phabricator.haskell.org/D5115
Diffstat (limited to 'rts/sm')
-rw-r--r-- | rts/sm/GCUtils.c | 10 | ||||
-rw-r--r-- | rts/sm/GCUtils.h | 1 |
2 files changed, 10 insertions, 1 deletions
diff --git a/rts/sm/GCUtils.c b/rts/sm/GCUtils.c index 24f7b5e61e..31b2913a37 100644 --- a/rts/sm/GCUtils.c +++ b/rts/sm/GCUtils.c @@ -81,6 +81,14 @@ freeChain_sync(bdescr *bd) RELEASE_SPIN_LOCK(&gc_alloc_block_sync); } +void +freeGroup_sync(bdescr *bd) +{ + ACQUIRE_SPIN_LOCK(&gc_alloc_block_sync); + freeGroup(bd); + RELEASE_SPIN_LOCK(&gc_alloc_block_sync); +} + /* ----------------------------------------------------------------------------- Workspace utilities -------------------------------------------------------------------------- */ @@ -261,7 +269,7 @@ todo_block_full (uint32_t size, gen_workspace *ws) // object. However, if the object we're copying is // larger than a block, then we might have an empty // block here. - freeGroup(bd); + freeGroup_sync(bd); } else { push_scanned_block(bd, ws); } diff --git a/rts/sm/GCUtils.h b/rts/sm/GCUtils.h index 3c174491ce..8b6040769e 100644 --- a/rts/sm/GCUtils.h +++ b/rts/sm/GCUtils.h @@ -31,6 +31,7 @@ INLINE_HEADER bdescr *allocBlockOnNode_sync(uint32_t node) } void freeChain_sync(bdescr *bd); +void freeGroup_sync(bdescr *bd); void push_scanned_block (bdescr *bd, gen_workspace *ws); StgPtr todo_block_full (uint32_t size, gen_workspace *ws); |