Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During page load, NVDA queries all of the math on a page for speech 3 times #3413

Closed
nvaccessAuto opened this issue Aug 7, 2013 · 5 comments

Comments

@nvaccessAuto
Copy link

Reported by soiffer on 2013-08-07 03:43
The bug concerns NVDA and IE+MathPlayer

When a page containing math that can be rendered by MathPlayer is being loaded, MathPlayer is queried three times for the speech for a math expression. This means that for a page that has 100 math expressions (e.g., a Wikipedia page), 300 conversions of MathML to speech happen. On average, a conversion takes around one tenth of a second, so loading the page adds 30 seconds to whatever time it otherwise takes!

One obvious thing is that NVDA should only ask for the math once, not three times. The three requests come because NVDA asks for the acc_value, acc_name, and acc_description. Because various screenreaders request those values differently (or at least did at one point in time), MathPlayer supplies spoken text for all of them, although the spoken text for acc_description differs because it includes embedded speech queues for SAPI 5 speech engines (Note: NVDA can set the TTS speech engine to SAPI4, SAPI5, SSML, or ECI and get higher quality speech back from MathPlayer).

What would be even better though would be to avoid querying the name/value/description for every (math) element on the entire page during loading. For large pages with lots of math, that's a huge slow down as I noted above.

I've attached a simple sample page for testing. For more complicated testing and to see the horrendously slow problem, see some wikipedia page such as
http://en.wikipedia.org/wiki/Quadratic_formula#Quadratic_formula

Note, to do this testing, you need to download and install MathPlayer and need to set up wikipedia to make the math accessible. You can get MathPlayer at
http://www.dessci.com/en/products/mathplayer/download.htm

and you get find instructions for making the math accessible in wikipedia at
http://www.dessci.com/en/support/mathplayer/tsn/tsn145.htm

@nvaccessAuto
Copy link
Author

Comment 1 by jteh (in reply to comment description) on 2013-08-07 04:08
Replying to soiffer:

One obvious thing is that NVDA should only ask for the math once, not three times. The three requests come because NVDA asks for the acc_value, acc_name, and acc_description.

We ask all MSAA objects in IE for this because it is usually different. We'd need to special case for MathPlayer here.

What would be even better though would be to avoid querying the name/value/description for every (math) element on the entire page during loading.

It'd be nice if we didn't have to pull the content for the entire document, but it's the only way we can provide a flat representation of the document with fast, consistent navigation.

@nvaccessAuto
Copy link
Author

Comment 3 by soiffer on 2013-08-07 05:25
A special case for a node named "math" sounds like a good idea. That would cut the time down by 1/3. If you do a special case, I do highly recommend trying to get the MathPlayer interface and passing in the TTS associated with the voice along with the current rate, pitch, and volume (the rate is the most important). You will get much better speech out because then the speech embeds higher quality prosody info into the text it returns and it will sound much more intelligible/better.

You don't need to assume MathPlayer is there, simply check if it is there and if so, make the four calls. I have sample code in C++ if that would help.

@nvaccessAuto
Copy link
Author

Comment 4 by jteh on 2013-08-07 05:57
Our browse mode doesn't deal with raw speech or braille, just content, so retrieving raw speech doesn't work for us. Even if we added some hack to push raw speech, we'd run into problems when the user tries to move through it word by word, character by character, etc. Also, some synths don't understand SAPI 5, SSML, etc.

@nvaccessAuto
Copy link
Author

Comment 5 by soiffer on 2013-08-07 06:16
Perhaps you misunderstood. MathPlayer just returns the text to speak. But if you tell it the standard supported by the TTS engine (SAPI 4/5, ECI, SSML), then it will generate text with embedded speech cues that are understood by the speech engine. If you store the raw speech text that includes those cues, then indeed navigation could potentially be affected by these cues (e.g., )

Other AT has dealt with this. I think they do this by simply skipping over these commands when navigating. Of course, you would need a flag on a text run that said that the embedded cues need to be skipped, but the syntax is well defined for the cues and I think the code to skip over them is relatively straightforward. I'm sure that this would take a day or so to code, but the result of having good prosody is much more intelligible speech that is much easier to listen to.

@nvaccessAuto
Copy link
Author

Comment 6 by jteh on 2015-07-09 23:42
No longer valid, as we don't support the old IE MathPlayer <= 3 any more in favour of the new MathPlayer >= 4.
Changes:
Added labels: invalid
State: closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant