First Work on Web Access Grant
Over the last two weeks, we have finally started on the research for the web access solution. Jamie and I started with a phone call where we talked about where exactly the project should go, we planned out a few very basic implementations such as as simply fixing up the current virtual buffers by making them faster and more accurate, or going for a completely different approach.
The different approach could be where the web content could be navigated using an object oriented idea. This would mean moving among objects on a particular level, and then moving in to an object and navigating the next level down. When navigating to an object, NVDA would speak the object plus any objects inside. This sort of means that rather than rendering the entire document, only the specific object being navigated to would be rendered. Though
this fixes a few problems with the current implementation it does also have its drawbacks in that it may take extra time to navigate from object to object. Having a good understanding about how other screen readers handle web content is quite important when developing our own solution, so I spent some time looking at various screen readers (namely Jaws, Window Eyes, System Access, Orca and Hal). I looked at how they show their web content and how they let users interact with form fields. I tested with Firefox when possible, failing this, Internet Explorer.
My findings showed me that some screen readers use two modes, one for arrowing around a flat view of the page, and one where arrowing goes straight through to the application, enabling the user to interact with forms. However, some other screen readers seem to integrate these two modes in to one where arrowing on to a field then automatically allows further arrow presses to go to the form field if it supports them, tabbing away from the field is the means of getting out. Advantages to this method is that the user does not have to worry about two modes, getting rid of the need to remember just how to toggle the modes. The disadvantages are that there is no over all logic that can be easily applied to the arrow keys when arrowing a web page. Sometimes the arrows move you around the page, sometimes they move in a form field.
Another variation I found was the way that simantic info such as link, heading, list etc was presented to the user. Some info such as lists in some screen readers appeared as physical text that you had to arrow around, some info in some screen readers was spoken but never shown at all, some info was shown in the buffer, but only took up the space of one character. Also whether a field type was spoken before or after the actual content varied due to the field type, and also the screen reader.
Although this quick look at screen readers hasn't completely set our minds to what is best for NVDA as far as navigation logic and speaking order goes, it has helped us formulate more of an idea as to what questions we will soon be asking screen reader users, as to how NVDA should look and feel when it comes to web access.
One screen reader we havn't yet been able to test is Voice Over (the Built in screen reader in MAC OSX 10.4 and above). Many users of this screen reader report great things about how it takes a more object oriented approach and gives a very useable access solution to users viewing web pages in Safari (the default MAC web browser).
Before we can make a better decision as to how NVDA's web access solution should allow users to navigate, we do need to try out Voice Over properly, so NV Access has hired a MAC Powerbook for the next week, enabling me to test Voice over with Safari and other applications, to get a good feel for how things are done the MAC way.
Probably the most important peace of work that I have worked on so far is extending a small C++ library I wrote called Gekco Walker. Now known as MSAA Walker, this software is used to traverse a tree of MSAA objects, logging their name, role, and value to a file. I originally wrote this library before starting the grant to time how long it took to completely traverse an MSAA tree produced from a Mozilla Gecko window with a particular web page loaded.
The code in its original state worked out of process, meaning that it executed from where the user started it. All MSAA objects being retreaved had to be pulled across process boundaries. This is the same way that NVDA currently works with MSAA objects, and probably is also the easiest way to code. Originally I firstly timed a particular web page just using NVDA itself to render it, and it took a total of 36 seconds. This was a very large page, being a quite verbose article from Wikipedia. I then timed how long it took to traverse the same page with MSAA walker. This only took a total of 12 seconds.
These findings so far show that there is at least a 3-time speed up of traversal when traversing a document using pure c++ as apposed to a higher level language such as Python. This is not to say that Python is bad, its just probably that in pure C++ there is much less bagage to be carried around when performing a repeditive task on many objects.
Since starting on the grant I have extended MSAA Walker so that it can also execute in-process. This means that it can inject itself in to the process containing the MSAA objects, and run inside. When retreaving MSAA objects, they no longer have to cross process boundaries, in theory speeding up the traversal.
Once I made the changes, I again ran MSAA Walker on the particular Wikipedia article, and this time instead of running in 12 seconds, or 36 seconds, it ran in a total of 0.8 seconds! When I extended the library I also allowed it to count all the MSAA objects it logged so I could make sure that the same amount of objects were being traversed. And sure enough, with both out of process and in-process, 5403 DOM nodes were counted.
Jamie and I had heard from people that in-process would give us a speed up, but like any information, it is important to test for yourself, especially when a lot of information to do with in-process execution comes from programmers of commercial screen readers, where the code is not available for testing. We were very surprised with the 12-time speed up, and after testing a few other large pages, also yielding times similar to 0.8 seconds, we became quite a bit more certain about sticking with a virtual buffer approach. However, we are still not yet at all ready to completely drop non-virtualBuffer ideas, or think about coding as we would still like to fully test Voice Over's web content support, and also get some answers from users as to how they like things presented.
Over the next week I plan to test Voice Over, and also have further talks with Jamie about the questions we'll be asking, and perhaps look more closely at useful methods of sharing a large buffer of information between processes. I already have been researching in to memory mapped files, and other means of sharing memory.