Fix a lost-wakeup bug in BLACKHOLE handling (#13751)

Summary: The problem occurred when * Threads A & B evaluate the same thunk * Thread A context-switches, so the thunk gets blackholed * Thread C enters the blackhole, creates a BLOCKING_QUEUE attached to the blackhole and thread A's `tso->bq` queue * Thread B updates the blackhole with a value, overwriting the BLOCKING_QUEUE * We GC, replacing A's update frame with stg_enter_checkbh * Throw an exception in A, which ignores the stg_enter_checkbh frame Now we have C blocked on A's tso->bq queue, but we forgot to check the queue because the stg_enter_checkbh frame has been thrown away by the exception. The solution and alternative designs are discussed in Note [upd-black-hole]. This also exposed a bug in the interpreter, whereby we were sometimes context-switching without calling `threadPaused()`. I've fixed this and added some Notes. Test Plan: * `cd testsuite/tests/concurrent && make slow` * validate Reviewers: niteria, bgamari, austin, erikd Reviewed By: erikd Subscribers: rwbarton, thomie GHC Trac Issues: #13751 Differential Revision: https://phabricator.haskell.org/D3630
author: Simon Marlow <marlowsd@gmail.com> 2017-06-03 20:26:13 +0100
committer: Simon Marlow <marlowsd@gmail.com> 2017-06-08 08:38:09 +0100
commit: 598472908ebb08f6811b892f285490554c290ae3 (patch)
tree: 84079aceefe1a7eacda507f104c0f0d4d8c12417 /rts/Interpreter.c
parent: bca56bd040de64315564cdac4b7e943fc8a75ea0 (diff)
download: haskell-598472908ebb08f6811b892f285490554c290ae3.tar.gz
1 files changed, 10 insertions, 0 deletions
diff --git a/rts/Interpreter.c b/rts/Interpreter.c
index 4926d1dab5..1a883a5b4b 100644
--- a/rts/Interpreter.c
+++ b/rts/Interpreter.c
@@ -115,6 +115,16 @@
    cap->r.rRet = (retcode);                     \
    return cap;
 
+// Note [avoiding threadPaused]
+//
+// Switching between the interpreter to compiled code can happen very
+// frequently, so we don't want to call threadPaused(), which is
+// expensive.  BUT we must be careful not to violate the invariant
+// that threadPaused() has been called on all threads before we GC
+// (see Note [upd-black-hole].  So the scheduler must ensure that when
+// we return in this way that we definitely immediately run the thread
+// again and don't GC or do something else.
+//
 #define RETURN_TO_SCHEDULER_NO_PAUSE(todo,retcode)      \
    SAVE_THREAD_STATE();                                 \
    cap->r.rCurrentTSO->what_next = (todo);              \
author	Simon Marlow <marlowsd@gmail.com>	2017-06-03 20:26:13 +0100
committer	Simon Marlow <marlowsd@gmail.com>	2017-06-08 08:38:09 +0100
commit	598472908ebb08f6811b892f285490554c290ae3 (patch)
tree	84079aceefe1a7eacda507f104c0f0d4d8c12417 /rts/Interpreter.c
parent	bca56bd040de64315564cdac4b7e943fc8a75ea0 (diff)
download	haskell-598472908ebb08f6811b892f285490554c290ae3.tar.gz