You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When watchdog terminates, there is a rare chance that a deadlock could occur. This will happen if:
watchdog.alive is called.
The timeout elapses and the watcher wakes.
Either watchdog.alive isn't called again or the watcher doesn't detect it before the next step.
watchdog.terminate is called, thus signaling the timer.
Meanwhile, the watcher is still waiting for the core to be alive and eventually attempting recovery.
I wasn't actually able to reproduce this in the way reported, but I'm pretty sure this is the issue. It is very much sensitive to timing.
The solution is to make watchdog._isAlive always return True if the watchdog has been terminated.
Inability to force quit/restart if NVDA freezes during exit
If watchdog freezes during termination as described above, trying to force quit or restart NVDA (e.g. by pressing control+alt+n) doesn't work. The problem is that we destroy the window that NVDA uses to detect a previous copy of NVDA before terminating watchdog. So, when watchdog termination freezes, the new copy of NVDA can't find the previous copy to kill it. This could also happen if anything else freezes during termination after the window is destroyed.
The solution is to destroy the window as late as possible.
Recovery attempts after core goes to sleep
The following could happen:
watchdog.alive is called.
The timeout elapses, so the core is treated as dead.
The watcher wakes, detecting core death.
watchdog.asleep is called.
In this case, the watcher won't stop recovery attempts until watchdog.alive is called again, even though watchdog.asleep was called (indicating that the core is now asleep, not dead).
I haven't been able to reproduce this in practice, but I'm pretty sure it's possible.
The solution is to call watchdog.alive at the top of watchdog.asleep to "reset" the watchdog. Note that CancelWaitableTimer does not reset the timer to unsignaled if it was signaled, which is the case here. Calling watchdog.alive will do this because it calls SetWaitableTimer.
The text was updated successfully, but these errors were encountered:
Comment 2 by jteh on 2015-07-01 01:50
Ronan, please try the "next" snapshot for 1 July (which will be available in about 8 hours) and report whether this fixes the issue you reported. A temporary/portable copy is fine. Thanks.
Comment 3 by James Teh <jamie@... on 2015-07-15 07:46
In [8e912ec]:
Fix a rare deadlock in watchdog termination, inability to force quit/restart NVDA if it freezes during exit and another rare watchdog edge case.
See the code comments and/or the ticket for full details.
Fixes #5189.
Reported by jteh on 2015-07-01 01:32
This ticket covers several issues I discovered as a result of the problem reported in this thread.
Deadlock in Watchdog termination
When watchdog terminates, there is a rare chance that a deadlock could occur. This will happen if:
I wasn't actually able to reproduce this in the way reported, but I'm pretty sure this is the issue. It is very much sensitive to timing.
The solution is to make watchdog._isAlive always return True if the watchdog has been terminated.
Inability to force quit/restart if NVDA freezes during exit
If watchdog freezes during termination as described above, trying to force quit or restart NVDA (e.g. by pressing control+alt+n) doesn't work. The problem is that we destroy the window that NVDA uses to detect a previous copy of NVDA before terminating watchdog. So, when watchdog termination freezes, the new copy of NVDA can't find the previous copy to kill it. This could also happen if anything else freezes during termination after the window is destroyed.
The solution is to destroy the window as late as possible.
Recovery attempts after core goes to sleep
The following could happen:
In this case, the watcher won't stop recovery attempts until watchdog.alive is called again, even though watchdog.asleep was called (indicating that the core is now asleep, not dead).
I haven't been able to reproduce this in practice, but I'm pretty sure it's possible.
The solution is to call watchdog.alive at the top of watchdog.asleep to "reset" the watchdog. Note that CancelWaitableTimer does not reset the timer to unsignaled if it was signaled, which is the case here. Calling watchdog.alive will do this because it calls SetWaitableTimer.
The text was updated successfully, but these errors were encountered: