Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for reading subtitles in videos, i.e. continuous and refreshable OCR #2797

Closed
nvaccessAuto opened this issue Nov 15, 2012 · 25 comments · Fixed by #15331
Closed

Support for reading subtitles in videos, i.e. continuous and refreshable OCR #2797

nvaccessAuto opened this issue Nov 15, 2012 · 25 comments · Fixed by #15331
Milestone

Comments

@nvaccessAuto
Copy link

Reported by nvdakor on 2012-11-15 06:59
Hi,
A number of users use a screen reader which reads subtitles on videos (for example, language subtitles for foreign film videos). Currently, NVDA does not read subtitles present in some videos, so would it be possible to allow NVDA to read them? Thanks.

@nvaccessAuto
Copy link
Author

Comment 1 by jteh on 2012-11-21 01:32
Please provide an example URL or application. The way subtitles are implemented differs greatly between video players. It'd also be good to know what screen readers do this.

@nvaccessAuto
Copy link
Author

Comment 2 by nvdakor on 2012-11-21 11:06
Hi,
Just asked some users. They told me that they're using a Korean screen reader called Sense Reader. They are also using GOM player, as this is the only media player that people can listen to subtitles using Sense Reader. In addition, users suggested adding subtitle reading support for VLC media player as well. Thanks.

@nvaccessAuto
Copy link
Author

Comment 3 by jteh on 2013-02-10 22:48
We can't add support for this unless the player exposes them to accessibility or native APIs.

@nvaccessAuto
Copy link
Author

Comment 4 by vortex on 2013-04-12 23:56
I wrote a program which can do this for external subtitle files, and it is compatible with NVDA. You cand find more at:
my website

@surfer0627
Copy link
Contributor

surfer0627 commented Apr 1, 2017

A company named CatchPlay, use Brightcove's Video Cloud service. It seems that NVDA could report captions. Here is a 5 minutes demo.

notes:

  • Captioning, is more commonly used as a service to aid deaf and hearing-impaired audiences. They are more adaptable to live broadcasts, such as news broadcasts, sports events and television shows broadcast live. Usually, captions (also called closed captions) appear as white text within a black box, appearing a second or two after being spoken.

  • Brightcove: provide Cloud Video platform.
  • CatchPlay: provide streaming service in Taiwan, Singapore, and Indonesia.

@LeonarddeR
Copy link
Collaborator

cc @josephsl

@ehollig
Copy link
Collaborator

ehollig commented Aug 11, 2017

CC @josephsl for further clerification

@mirovg
Copy link

mirovg commented Oct 10, 2018

GOM Player: F5 - Others - Accessibility - Turn on "Outputing subtitles as window titles for the voice output program". Works with NVDA. Sometimes dont work at first so I try it 2 - 3 times again, but finaly it woks corretly.

@fernando-jose-silva
Copy link

I've heard voice over demos for iphone reading subtitles on netflix.
I do not know how nvda behaves on netflix, unfortunately I am not a user of this movie platform.
But if the voice over can, if netflix provides the same features for the web interface or some application for windows 10 nvda could also offer this functionality.

@OzancanKaratas
Copy link
Collaborator

How can we support the Netflix website for this issue?

@Adriani90
Copy link
Collaborator

@florianionascu7 a romanian guy developed an addon for reading subtitles. Do you know hit profile on Github? Maybe he can contribute here by raising a pull request.

@florianionascu7
Copy link

Yes, I know his Github profile. It's: @vortex1024

@vortex1024
Copy link

Hello, I am not sure this would be accepted, since it doesn't fit in basic screen reader functionality. also, the results vary greatly depending on the quality of the video and computer performance. Any NVDA developers comments about this?

@Adriani90
Copy link
Collaborator

In my view such a feature, if it is optional, it fits very well into screen reading basics. In fact it would be something like reading live regions. Obviously the screen reader should read the text that appears on the screen which is visible for a sighted person. So why not reading sub titles?
cc: @feerrenrut, @michaelDCurran would you accept a coresponding pull request for such a feature?

@vortex1024 could you please post a link to the github repository of the addon?

@feerrenrut
Copy link
Contributor

I think it depends on the implementation, and whether it will make maintenance harder. @vortex1024 maybe you can give us an overview of the approach?

@vortex1024
Copy link

Sure. basically, what I'm doing is I run the win 10 OCR in a while loop, in a separate thread, with a configured sleeping interval. The area to scan can be full screen, focus, foreground or navigator object. IN full screen, I added options to percentually crop zones from the screen, both to make recognition faster and easier, and to remove certain bits of the screen to be recognised, such as the TV logo or current time of the video. I've been recently asked to make the cropping options available for the foreground object too, people seem to use it for games as well.

In order not to read the same text again and again, I use the included difflib.sequence_matcher python class to determine whether the text has changed.
I haven't published the code, as I don't think it is polished enough yet, but, if there's interest for a pull request, I can try and reorganise it to the best of my abilities, then improve it based on comments.
Thanks

@OzancanKaratas
Copy link
Collaborator

I think we should write a handler for the subtitle files. It finds the subtitles in the videos using the video time. I think that the OCR operation will damage the content of subtitles.

It can also be considered to write a kernel mode driver such as video intercept.

@Adriani90
Copy link
Collaborator

There is a discussion on the lion addon also here:
https://forum.audiogames.net/topic/33489/lion-nvda-universal-subtitle-reader-and-more/

@vortex1024 gets lot of positive user feedback for this addon, and there came up several use cases for this addon:

  1. In movies with subtitles on Youtube, netflix, VLC Mediaplayer etc. Very useful also for blind deaf people and people watching movies and documentaries in multiple languages
  2. In mainstream games (i.e. people manage to play Fifa and lots of other games with this addon)
  3. In online conferences where people share live presentations
    and possibly other areas.

Current limitations:

  • Creating profiles for different websites, games, apps etc is not possible. This problem would be solved if this addon was part of NVDA
  • Text recognition is not 100% accurate due to OCR limitations. But this could be solved if GDI calls could be applied (i.e. also using display model). For this the help of experienced NVDA developers is needed
  • Selected text cannot be recognized by the addon, so the speech is not reacting as fast as the cursor moves and stops speaking when you move between text pieces. This is because in most cases in games the text is drawn on the screen and highlighted with a certain color if it is selected, probably related to display model. I think @LeonarddeR has proposed some improvements to display model code in NVDA. The improvements could be applied here as well I think
  • When subtitles appear under eachother the addon stops working. It seems the addon waits until one piece of text disappears and another one appears. This could also be solved by applying GDI calls or by improving the OCR calls in the code.

Maybe @jcsteh and @michaelDCurran are also interested to contribute to this. I think this feature would open a large new area for blind and visually impaired users.

@Adriani90
Copy link
Collaborator

The first beta version of this addon is here:
http://vortex.go.ro/api/download/get/1

@vortex1024 if you could upload it to your github file along with your source code files, we could have a better impression on how the addon works.
Thanks for your great work sofar.

@seanbudd seanbudd changed the title Support for reading subtitles in videos Support for reading subtitles in videos, i.e. continuous and refreshable OCR Aug 18, 2021
@Qchristensen
Copy link
Member

Just a quick note that I had a call from a user today requesting this functionality - they cited the two examples of video games with text, where stopping to run OCR and reading the results is impractical, and also videos with subtitles embedded into the video (and thus not readable regularly).

@cary-rowen

This comment was marked as off-topic.

@jcsteh

This comment was marked as resolved.

@cary-rowen

This comment was marked as resolved.

@Adriani90
Copy link
Collaborator

The new whisper.cpp from Open AI is quite successful in creating subtitles in many languages from Videos and Audios, and it might be really interesting to look in the potential for NVDA, becoming more independent from external sources such as OCR or automatic generated subtitles from Youtube etc. Here is a GUI to use the whisper.cpp in Python.
https://www.reddit.com/r/Python/comments/12kyfl4/i_made_a_simple_gui_to_use_whispercpp_in_python/

cc: @jcsteh, @LeonarddeR

@Adriani90
Copy link
Collaborator

The GUI seems to work offline without any API, so I wonder if this would even work without internet connection?

seanbudd pushed a commit that referenced this issue Sep 1, 2023
Replaces #11270, adding configuration.
Fixes #2797.

Summary of the issue:
Some videos include text which is graphical only with no accompanying verbalization, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognize is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears.

This could be useful for other scenarios as well such as virtual machines and games.

Description of how this pull request fixes the issue:
When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible.

The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.

While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
@nvaccessAuto nvaccessAuto added this to the 2023.3 milestone Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet