diff options
author | Simon Marlow <marlowsd@gmail.com> | 2017-06-03 20:26:13 +0100 |
---|---|---|
committer | Simon Marlow <marlowsd@gmail.com> | 2017-06-08 08:38:09 +0100 |
commit | 598472908ebb08f6811b892f285490554c290ae3 (patch) | |
tree | 84079aceefe1a7eacda507f104c0f0d4d8c12417 /rts/Interpreter.c | |
parent | bca56bd040de64315564cdac4b7e943fc8a75ea0 (diff) | |
download | haskell-598472908ebb08f6811b892f285490554c290ae3.tar.gz |
Fix a lost-wakeup bug in BLACKHOLE handling (#13751)
Summary:
The problem occurred when
* Threads A & B evaluate the same thunk
* Thread A context-switches, so the thunk gets blackholed
* Thread C enters the blackhole, creates a BLOCKING_QUEUE attached to
the blackhole and thread A's `tso->bq` queue
* Thread B updates the blackhole with a value, overwriting the BLOCKING_QUEUE
* We GC, replacing A's update frame with stg_enter_checkbh
* Throw an exception in A, which ignores the stg_enter_checkbh frame
Now we have C blocked on A's tso->bq queue, but we forgot to check the
queue because the stg_enter_checkbh frame has been thrown away by the
exception.
The solution and alternative designs are discussed in Note [upd-black-hole].
This also exposed a bug in the interpreter, whereby we were sometimes
context-switching without calling `threadPaused()`. I've fixed this
and added some Notes.
Test Plan:
* `cd testsuite/tests/concurrent && make slow`
* validate
Reviewers: niteria, bgamari, austin, erikd
Reviewed By: erikd
Subscribers: rwbarton, thomie
GHC Trac Issues: #13751
Differential Revision: https://phabricator.haskell.org/D3630
Diffstat (limited to 'rts/Interpreter.c')
-rw-r--r-- | rts/Interpreter.c | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/rts/Interpreter.c b/rts/Interpreter.c index 4926d1dab5..1a883a5b4b 100644 --- a/rts/Interpreter.c +++ b/rts/Interpreter.c @@ -115,6 +115,16 @@ cap->r.rRet = (retcode); \ return cap; +// Note [avoiding threadPaused] +// +// Switching between the interpreter to compiled code can happen very +// frequently, so we don't want to call threadPaused(), which is +// expensive. BUT we must be careful not to violate the invariant +// that threadPaused() has been called on all threads before we GC +// (see Note [upd-black-hole]. So the scheduler must ensure that when +// we return in this way that we definitely immediately run the thread +// again and don't GC or do something else. +// #define RETURN_TO_SCHEDULER_NO_PAUSE(todo,retcode) \ SAVE_THREAD_STATE(); \ cap->r.rCurrentTSO->what_next = (todo); \ |