New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for reading subtitles in videos, i.e. continuous and refreshable OCR #2797
Comments
Comment 1 by jteh on 2012-11-21 01:32 |
Comment 2 by nvdakor on 2012-11-21 11:06 |
Comment 3 by jteh on 2013-02-10 22:48 |
Comment 4 by vortex on 2013-04-12 23:56 |
A company named CatchPlay, use Brightcove's Video Cloud service. It seems that NVDA could report captions. Here is a 5 minutes demo. notes:
|
cc @josephsl |
CC @josephsl for further clerification |
GOM Player: F5 - Others - Accessibility - Turn on "Outputing subtitles as window titles for the voice output program". Works with NVDA. Sometimes dont work at first so I try it 2 - 3 times again, but finaly it woks corretly. |
I've heard voice over demos for iphone reading subtitles on netflix. |
How can we support the Netflix website for this issue? |
@florianionascu7 a romanian guy developed an addon for reading subtitles. Do you know hit profile on Github? Maybe he can contribute here by raising a pull request. |
Yes, I know his Github profile. It's: @vortex1024 |
Hello, I am not sure this would be accepted, since it doesn't fit in basic screen reader functionality. also, the results vary greatly depending on the quality of the video and computer performance. Any NVDA developers comments about this? |
In my view such a feature, if it is optional, it fits very well into screen reading basics. In fact it would be something like reading live regions. Obviously the screen reader should read the text that appears on the screen which is visible for a sighted person. So why not reading sub titles? @vortex1024 could you please post a link to the github repository of the addon? |
I think it depends on the implementation, and whether it will make maintenance harder. @vortex1024 maybe you can give us an overview of the approach? |
Sure. basically, what I'm doing is I run the win 10 OCR in a while loop, in a separate thread, with a configured sleeping interval. The area to scan can be full screen, focus, foreground or navigator object. IN full screen, I added options to percentually crop zones from the screen, both to make recognition faster and easier, and to remove certain bits of the screen to be recognised, such as the TV logo or current time of the video. I've been recently asked to make the cropping options available for the foreground object too, people seem to use it for games as well. In order not to read the same text again and again, I use the included difflib.sequence_matcher python class to determine whether the text has changed. |
I think we should write a handler for the subtitle files. It finds the subtitles in the videos using the video time. I think that the OCR operation will damage the content of subtitles. It can also be considered to write a kernel mode driver such as video intercept. |
There is a discussion on the lion addon also here: @vortex1024 gets lot of positive user feedback for this addon, and there came up several use cases for this addon:
Current limitations:
Maybe @jcsteh and @michaelDCurran are also interested to contribute to this. I think this feature would open a large new area for blind and visually impaired users. |
The first beta version of this addon is here: @vortex1024 if you could upload it to your github file along with your source code files, we could have a better impression on how the addon works. |
Just a quick note that I had a call from a user today requesting this functionality - they cited the two examples of video games with text, where stopping to run OCR and reading the results is impractical, and also videos with subtitles embedded into the video (and thus not readable regularly). |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
The new whisper.cpp from Open AI is quite successful in creating subtitles in many languages from Videos and Audios, and it might be really interesting to look in the potential for NVDA, becoming more independent from external sources such as OCR or automatic generated subtitles from Youtube etc. Here is a GUI to use the whisper.cpp in Python. cc: @jcsteh, @LeonarddeR |
The GUI seems to work offline without any API, so I wonder if this would even work without internet connection? |
Replaces #11270, adding configuration. Fixes #2797. Summary of the issue: Some videos include text which is graphical only with no accompanying verbalization, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognize is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears. This could be useful for other scenarios as well such as virtual machines and games. Description of how this pull request fixes the issue: When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible. The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken. Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur. While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
Reported by nvdakor on 2012-11-15 06:59
Hi,
A number of users use a screen reader which reads subtitles on videos (for example, language subtitles for foreign film videos). Currently, NVDA does not read subtitles present in some videos, so would it be possible to allow NVDA to read them? Thanks.
The text was updated successfully, but these errors were encountered: