Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in NVDA Process Injection #4888

Closed
nvaccessAuto opened this issue Feb 6, 2015 · 8 comments
Closed

Deadlock in NVDA Process Injection #4888

nvaccessAuto opened this issue Feb 6, 2015 · 8 comments

Comments

@nvaccessAuto
Copy link

Reported by camlorn on 2015-02-06 04:08
Hi,
    So.  Let me emphasize that this is a maybe still.  A pretty sure maybe, but a maybe nonetheless.  Anyhow.

    I'm working with Pyglet, a framework for game development in Python.  It's OpenGL-based and includes a lot of stuff.  Merely importing the modules pyglet and pyglet.app in a Python script causes a deadlock which appears to be in NVDA process injection code.  I've attached tracebacks, as the debug log is long.

My question at the moment is this.  What events specifically trigger NVDA's injection routines?  As far as I can tell, the importing of these modules runs nothing save some dll loading, yet two threads end up going through NVDA.  The latter appears to be something to do with the graphics driver.  And the end result is a deadlock involving LdrpLoaderLock and inprocThreadsLock, the former of which Google reveals little about and the latter of which is in NVDA.

Any suggestions?  I'd love to solve this, as otherwise it's back to maintaining my own package that does less than the one I want to use.

@nvaccessAuto
Copy link
Author

nvaccessAuto commented Feb 6, 2015

Attachment debug.txt added by camlorn on 2015-02-06 04:08
Description:
Stack tracebacks and critical section info
Update:
File added from Trac
debug.txt

@nvaccessAuto
Copy link
Author

Comment 1 by mdcurran on 2015-02-06 04:29
My guess is that Windows' dll loading infrastructure is deadlocking due to two dll loads trying to occur at the same time. The big rule in Windows dll loading is that dllMain should try and do as little as possible. It should certainly not cause any other dll to be loaded or unloaded.

From your stack I can see that the ig7icd32 module (what ever that is) is running its dllmain function, but inside it, somehow a Windows hook is running. Perhaps that dllmain created a window and sent a message. But the stack around there may not be accurate as the warning states.

I don't think that dllmain call is deliberately loading extra dlls, but it is perhaps sending a window message, which is making one of NVDA's windows hooks run. This hook tries to acquire NVDA's inprocThreadsLock.

At the same time, possibly due to dllmain creating a window or something, NVDA's winEvent callback is also being run, which as part of it it tries to register more windows hooks and in doing that SetWindowsHookEx tries to increase the refcount on the dl containing the hook (nvdaHelper) by calling GetModuleFileNameEx or something. But as ig7icd32's dllmain is currently running, this call must wait until it finishes. but, it cannot because it is currently also holding inprocThreadsLock.

It could be argued that perhaps we should try and figure out a way of not holding the lock at the time we are registering Windows hooks due to a dll load, but it can certainly also be argued that dllmain ina module should not really do any GUI stuff either.

@nvaccessAuto
Copy link
Author

Comment 2 by mdcurran on 2015-02-06 04:29
This is going to be tricky to fix, and possibly dangerous. We can only afford to spend the time, or take the risk to fix it if it can be demonstrated that this issue is likely to happen for many people.

@nvaccessAuto
Copy link
Author

Comment 3 by camlorn on 2015-02-06 21:28
The fix specifically for Pyglet appears to be the following two lines at the very top of whatever file starts the app. I am not completely convinced that these do it, but I can't trigger it simply anymore and see below.

import pyglet
pyglet.options['shadow_window'] = False

As far as I can tell, the potential for this is on any graphics card which uses any Intel driver that does this. I believe that a second condition is that the app dynamically load OpenGL; otherwise, the DLL is loaded well before the creation of windows.
If this is not the fix, it is also possible to use ctypes to dynamically load the offending dll before importing anything that uses it.
I can't say how wide-spread this is. My understanding is that rather a lot of apps that use OpenGL dynamically load at least some of the runtime on Windows: for a very long time, the Windows headers have only been 1.1. Even then, I think it's pretty common to statically link anyway and then request the function addresses. It's been a long time since I've looked at OpenGL programming.
I may investigate a patch just because I'm one of the three people who understands what's going on here and has read the offending code, at least for the moment, but we'll see and I'm not submitting anything unless I'm reasonably certain I haven't broken 5 other things. It may be possible for us to look at the stack and see if we're being called from Dllmain, but other than that I'm coming up blank on how to even start dealing with it and I'm not sure how to finish that thought anyway.

@nvaccessAuto
Copy link
Author

Comment 4 by camlorn on 2015-04-03 22:29
This came back. My fix isn't good enough. I'm trying something else now for this app specifically, namely force-loading the offending DLL with ctypes. I think that this may be a problem for anyone on Intel integrated graphics and who is running something that loads the driver around the same time a window is created via loadLibrary. I think static linking to everything is safe because all the dllmains run before the app starts, but at this point I'm not too eager to be quoted on that.

@nvaccessAuto
Copy link
Author

Comment 5 by camlorn on 2015-04-04 15:27
how problematic is it if we allow process injection to fail the first time in the event that a deadlock happens? If process injection triggers as-needed, then we can probably just try grabbing the lock with a timeout of, say, 250 ms. I'd think this would be okay because I can restart NVDA while processes are running, but I'm not sure if this is a one chance sort of thing or not.
I'm suspecting that this problem is wider than Pyglet at this point. I'm working on seeing if I can find a test case, but doubt that I will manage it. Without the Pyglet shadow window, it can go 300 or so runs.

@nvaccessAuto
Copy link
Author

Comment 6 by camlorn on 2015-07-09 15:32
I appear to have fixed this specific case via a driver update from Intel; the bootcamp driver for my graphics card was outdated. My test case can now run about 30000 times without producing the deadlock and without the hack I posted above (this is up from about 20). It would have kept going, but this was equivalent to running it for 10 hours, so I killed it. It's at least exceedingly rare with the new drivers.

@Adriani90
Copy link
Collaborator

closing as worksforme because it depends on the grafic driver and it is apparently fixed for @camlorn. Feel free to comment if this is not the case and we can reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants