summaryrefslogtreecommitdiff
path: root/lib/dev
diff options
context:
space:
mode:
authorNathan Froyd <froydnj@mozilla.com>2018-05-30 10:29:53 -0400
committerNathan Froyd <froydnj@mozilla.com>2018-05-30 10:29:53 -0400
commitcea9c7ec22e0483a4d5d03245cd03f443aeb752b (patch)
treee87dae5a1a26b3873c54a00224f8655c80c3f7ac /lib/dev
parent08822cb4e64e092862a23d6017958bcf5c4fc29e (diff)
downloadnss-hg-cea9c7ec22e0483a4d5d03245cd03f443aeb752b.tar.gz
Bug 1461731 - wait indefinitely in nssSlot_IsTokenPresent; r=rrelyea,fkiefer
Crashes for a particular hang have been spiking in the last month, and all the crashes are associated with macOS 10.12 and 10.13. The crashes look like this: Thread 1: waiting on a condition variable in nssSlot_IsTokenPresent Thread 2: waiting on the (contended) lock in nssSlot_IsTokenPresent Thread 3: waiting on the (contended) lock in nssSlot_IsTokenPresent Thread 2 and 3 are waiting on the lock associated with the condition variable that thread 1 is holding. One would expect that thread 1 would drop the lock associated with the condition variable when the wait occurs, and enable thread 2 or thread 3 to make progress. But the particular wait in question passes a (relative) timeout of zero (which corresponds to what would be PR_INTERVAL_NO_WAIT), which is unusual in NSS code and condition variable-using programs in general. A relative timeout of zero on macOS needs to be translated to an absolute time for the underlying API, pthread_cond_timedwait. What appears to be happening is that some absolute time, $NOW, is determined before calling pthread_cond_timedwait. We then call into pthread_cond_timedwait and do whatever work we need to do before checking whether the specified time ($NOW) has passed. Of course it has; we are at some time $NOW + epsilon, and so the wait times out. The wait appears to time out without the lock ever being released; if the lock was released, even if ever-so-shortly, presumably one of the other threads would be able to make progress. Since the hang only occurs on macOS 10.12 and 10.13, we are assuming that there was some change in the condition variable code that attempts to optimize extremely short timeouts, or treats timeouts of zero differently (even if inadvertently). The other possibility is this is the way macOS has always worked, and the crash data we have is only for those versions of the operating system. In any event, there's no need to specify a timeout of zero here. We can specify an "infinite" wait instead (PR_INTERVAL_NO_TIMEOUT) and let another thread make progress, waking us up when it is done.
Diffstat (limited to 'lib/dev')
-rw-r--r--lib/dev/devslot.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/lib/dev/devslot.c b/lib/dev/devslot.c
index ef941f9ed..ebd6e6aa5 100644
--- a/lib/dev/devslot.c
+++ b/lib/dev/devslot.c
@@ -153,7 +153,7 @@ nssSlot_IsTokenPresent(
/* set up condition so only one thread is active in this part of the code at a time */
PZ_Lock(slot->isPresentLock);
while (slot->isPresentThread) {
- PR_WaitCondVar(slot->isPresentCondition, 0);
+ PR_WaitCondVar(slot->isPresentCondition, PR_INTERVAL_NO_TIMEOUT);
}
/* if we were one of multiple threads here, the first thread will have
* given us the answer, no need to make more queries of the token. */