Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic language detection based on unicode ranges #2990

Open
nvaccessAuto opened this issue Feb 13, 2013 · 77 comments
Open

Automatic language detection based on unicode ranges #2990

nvaccessAuto opened this issue Feb 13, 2013 · 77 comments
Labels
blocked/needs-code-review component/speech enhancement p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority triaged Has been triaged, issue is waiting for implementation.

Comments

@nvaccessAuto
Copy link

Reported by ragb on 2013-02-13 12:26

This is kind of a spin-of of #279.

As settled some time ago, this proposal aims to implement automatic text “language” detection for NVDA. The main goal of this feature is for users to read text in different languages (or better said, language families) using proper synthesizer voices. By using unicode character ranges, one can understand at least the language family of a bunch of text: Latine-based (english, german, portuguese, spanish, french,…),, cyrilic (russian, ukrainian,…), kanji (japanese, maybe korean, - I that already written but it is too much for my memory), greek, arabic (arabic, farsy), and others more.

In broad terms, the implementation of this feature in NVDA requires the addition of a detection module in the speech sub system, that intercepts speech commands and adds “fake” language commands for the synth to change language, based on changes on text characters. It is also needed an interface for the user to tell NVDA what particular language to choose for some language family, that is, what to assume for latin-based, what to assume for arabic-based characters, etc.

I’ve implemented a prototype of this feature in a custome vocalizer driver, with no interface to choose the “proper” language. Prliminary testing with arabic users, using arabi and english vocalizer voices, has shown good results, that is, people like the idea. Detection language code was adapted from the Guess_language module, removing some of the detection code which was not applicable (tri-gram detection for differentiating latin languages, for instance).

I’ll explain the decision to use, for now, only unicode based language detection. Language detection could also be done using trigrams (see here for instance), dictionaries, or other heuristics of that kind. However, the text that is passed each time for the synthesizer is very very small (a line of text, a menu name, etc), which makes these processes, which are probabilistic by nature, very very error-prone. From my testing, applying trigram detect for latin languages in NVDA showed completely unusable, further from adding a noticeable delay when speaking. For bigger text content (books, articles, etc.) it seems to work well, however I don’t know if this can by applied somehow in the future, say by analyzing virtuel buffers, or anything.

Regarding punctuation, digits, and other general characters, I’m defaulting to the current language (and voice) of the synth.

I’ll create a branch with my detection module integrated within NVDA, with no interface.

Regarding the interface for selecting what language to assume for each given language group (when applicable, greek, for instance, is only itself), I see a dialog with various combo boxes, each one for each language family, to choose the language to be used. I think restricting the available language choices from the available languages of the current synth may improve usability. I don’t know where to put that dialog, or what to call it (“language detection options”?).

Any questions please ask.

Regards,

Rui Batista
Blocked by #5427, #5438

@nvaccessAuto
Copy link
Author

Comment 1 by jteh on 2013-02-13 12:29
Is this technically a duplicate of #1606? (If so, we'd probably close #1606, since this one contains more technical detail.)

@nvaccessAuto
Copy link
Author

Comment 2 by ragb (in reply to comment 1) on 2013-02-13 12:37
Replying to jteh:

Is this technically a duplicate of #1606? (If so, we'd probably close #1606, since this one contains more technical detail.)

I thin #1606 is only related with ponctuation, although, to be honest, I don't understand that ticket's description that well.

@nvaccessAuto
Copy link
Author

Comment 3 by Ahiiron on 2013-05-21 14:35
I think for usability and reliability as you said, the user would probably configure languages to auto-switch to, like the Vocalizer implementation.

@nvaccessAuto
Copy link
Author

Comment 4 by dineshkaushal on 2015-07-13 05:30
Please check auto language detection.

There is a Writing Script dialog within preferences menu. This dialog has options to add/ remove and move up and down languages. I tested with 2 Devanagari languages Hindi and marathi, and I could get the proper language code for those languages in the log.

Code is in branch in_t2990

@nvaccessAuto
Copy link
Author

Comment 6 by dineshkaushal on 2015-08-17 19:16
In this round, the adjacent ranges are merged, code is reorganized, option to ignore language detection for language specified by document is added, detailed review of sequence is done and comments are improved. There are 2 branches, in_t2990 branch with iso 15924 script codes with a bit more complicated code and presumably fast code, and in_t2990_simple with iso codes removed with simple code and hopefully not slow code.

@nvaccessAuto
Copy link
Author

Comment 7 by jteh on 2015-09-21 05:09
Note: there was a round of code review which was unfortunately lost resultant to the recent server failure. However, the review was addressed. The following relates to the most recent changes.

Thanks for the changes, Dinesh. This looks pretty good. A few things:

gui

  • I didn't spot this the first time, but you pass the name= keyword argument when creating WritingScriptsDialog.languageList. As far as I can tell, this name argument isn't used for anything, so it should be removed. The label (which you add above) is what gets displayed.
  • You set a tool tip for the language list, but this was copied from elsewhere and isn't relevant here. It should be removed.
  • The label "Priority Language for auto language detection" is a bit awkward. Perhaps "Preferred languages for auto language detection"?
  • Looking at this further (now that I can add and remove entries, etc.), I think a wx.ListBox with single selection would be more appropriate, as a sighted user can then see all of the preferred languages.
  • If you hit any of the buttons in that dialog, the user's position in the list becomes invalid. Obviously, if the user removes an item, you can't restore to that item, but the position should at least be the item above or below. At present, the user just loses their position and pressing down arrow just throws them to the top item.

unicodeScriptHandler

  • This file needs a copyright header. Copy from another module and tweak; e.g. browseMode.py.
  • langIDToScriptID is a dict (which is unordered), which means scriptIDToLangID is also unordered. I now understand why you originally had the setdefault code, but even this doesn't solve the problem of us not really understanding what language will get chosen as a fallback. I realise it's hard to choose defaults for some scripts, but there probably should at least be some defaults; e.g. English probably makes sense for Latin given the prevalence of English text. You should be able to achieve this with an OrderedDict for langIDToScriptID. scriptIDToLangID doesn't need to be ordered; you can just use your original setdefault code for that, since you always want to take the first language. There should be comments about this, though.

unicodeScriptPrep

  • Needs copyright header; see above.
  • Please add a brief docstring explaining what this module does.

Documentation

  • Please add documentation to the User Guide concerning the new dialog.

Thanks!

@nvaccessAuto
Copy link
Author

Comment 8 by dineshkaushal on 2015-09-28 08:11
Fixed all the code related issues. I have not yet added the documentation; I will do it once the code is ok. Should I modify userGuide.html?

@nvaccessAuto
Copy link
Author

Comment 9 by dineshkaushal on 2015-10-07 13:34
Added documentation for Writing Scripts section in configuring NVDA main section.

@nvaccessAuto
Copy link
Author

Comment 10 by James Teh <jamie@... on 2015-10-18 23:55
In commit eb09127:
Merge branch 't2990' into next

Incubates #2990.
Changes:
Added labels: incubating

@nvaccessAuto
Copy link
Author

Comment 11 by jteh on 2015-10-19 01:22
Thanks. I made quite a few changes before incubating. Here are the significant ones:

  • Fixed bug where auto language detection was occurring for characters even with auto language switching disabled.
  • Fixed bug where pressing Remove button after opening the dialog removed the last item instead of the first.
  • Fixed bug where pressing the Remove or Move up buttons in the dialog when there were no items caused an exception.
  • Fixed bug in the dialog where if you removed all languages and then pressed Add, the languages you removed wouldn't appear in the Add dialog. This is a common Python mistake you should be aware of: when boolean testing a list (e.g. if not ignoreLanguages), the list being empty will be treated as False. Most of the time, this is what you want, but in some cases (like this one), you actually need to know the difference between the empty list and None (no list provided). To differentiate these, you must use: if ignoreLanguages is None
  • Translator comments: corrections, removed extraneous comments, added missing comments.
  • Changed "writing scripts" to "language detection" across the board. Looking at this as an actual user, Writing Scripts just isn't intuitive to most users. We only use this for language detection anyway. I also made some other terminology and documentation more user friendly.
  • Renamed unicodeScriptHandler module to languageDetection, as it's only used for this and this is clearer about its purpose.

@nvaccessAuto
Copy link
Author

Comment 12 by MarcoZehe on 2015-10-19 10:46
This has some unwanted side effects: The latin unicode range seems to be hard-coded to English, but the range may also include French, German, and other European languages. In my case, I am bilingually working in English and German contexts all day. So even when my Windows is set to English, my synthesizer is usually set to the German voice, because I can stand the German voice speaking English, but I cannot stand the English voice, of any synthesizer, try to speak German.

In consequence: If I try to set my synth to German Anna in the Vocalizer 2.0 for NVDA, it will still use the English Samantha voice for most things, even German web pages. I have to turn off language detection completely to get my old functionality back. This will, of course, also take away the language switching where the author did use correct lang attributes on web sites or in Word documents.

@nvaccessAuto
Copy link
Author

Comment 14 by James Teh <jamie@... on 2015-10-19 11:59
In commit 6fd9ad3:
Merge branch 't2990' into next: Hopefully fixed problems which caused the voice language not to be preferred for language detection.

Incubates #2990.

@nvaccessAuto
Copy link
Author

Comment 16 by nishimotz on 2015-10-19 12:32
I have tested nvda_snapshot_next-12613,8dbd961 with an add-on version of Japanese TTS,
which is developed by me and supports LangChangeCommand.

For example, the word 'Yomu' ('read' in Japanese) usually consists of two characters,
0x8aad and 0x3080.

読む

The first one is ideographic character (Chinese letter),
and the second is phonetic character (Hiragana).

To give correct pronunciation, Japanese TTS should take the two characters at the same time,
because the reading of Chinese character is context-dependent in Japanese language.

With this version of NVDA, the two letters are pronounced separately, so the reading of first letter is wrong.
If automatic language detection is turned off, the issue does not occur.

In the unicodeScriptData.py, it seems that 0x8aad is in the range of "Han",
and 0x3080 is "Hiragana".
For Japanese language, they should be treated as single item in the detectedLanguageSequence.

@nvaccessAuto
Copy link
Author

Comment 18 by jteh (in reply to comment 16) on 2015-10-26 11:04
Dinesh, thoughts on comment:16?

@nvaccessAuto
Copy link
Author

Comment 19 by nvdakor on 2015-10-27 07:51
Hi,
To whoever coded lang detection dialog: may I suggest some GUI changes:

  • Move up/down buttons: it might be better to lay it out horizontally. Also, shouldn't these buttons become disabled once the index of the selected language (GetString Selection) reaches lower or upper limit (0 or -1)?
  • Add/remove: It might be better to position them horizontally, following what's available in config profiles dialog.
    I'd be happy to push these changes as part of t2990 branch. Thanks.

@nvaccessAuto
Copy link
Author

Comment 20 by nvdakor on 2015-10-27 07:53
Hi,
On second thoughts, I'd wait until the fundamentals are done (including fixing comment 16) before pushing GUI changes.

@nvaccessAuto
Copy link
Author

Comment 21 by mohammed on 2015-10-27 13:47
hi.

another GUI change would be to only have a close button. I don't think OK and cancel are functional in this dialogue box. thoughts?

on another note, since #5427 is closed as fixed, I think it should be removed from the blocking tickets?

thanks.

@nvaccessAuto
Copy link
Author

Comment 22 by jteh on 2015-10-28 00:41
Holding this back for 2015.4, as there are outstanding issues, and even if they are fixed, there won't be sufficient time for them to be tested.
Changes:
Milestone changed from near-term to None

@nvaccessAuto
Copy link
Author

Comment 23 by jteh (in reply to comment 21) on 2015-10-28 00:51
Replying to mohammed:

another GUI change would be to only have a close button. I don't think OK and cancel are functional in this dialogue box.

They should be. Cancel should discard any changes you make (e.g. removing a language you didn't intend to remove), whereas OK saves them.

on another note, since #5427 is closed as fixed, I think it should be removed from the blocking tickets?

No, it shouldn't. Blocking indicates whether another ticket was required for this one, whether it's fixed yet or not. If it is fixed, it's still useful to know that it was required.

@nvaccessAuto
Copy link
Author

Comment 24 by dineshkaushal on 2015-10-28 07:32
Regarding comment 16:

The problem of han and Hiragana is occurring because our algorithm assumes that each language has only one script. One possible solution is that during unicodeData building we can name all han and hiragana characters as something HiraganaHan and then add language to script mapping for Japanese as HiraganaHan we could do the same for chinese and Korean.

Another solution is that we could create script groups and add a check for script groups for each character and do not split strings for script groups.

Could anyone explain what scripts are relevant for Japanese, Chinese and Korean languages? and how various scripts combine for these languages.

Alternatively a reliable reference for a resource.

@nvaccessAuto
Copy link
Author

Comment 26 by nishimotz on 2015-10-28 08:49
Speaking from conclusion, the approach of DualVoice addon is much useful for Japanese language users:

  • Latin characters are treated as second voice language, which is configurable.
  • In some cases, numbers or punctuations should be treated as Japanese language (i.e. use Japanese symbol dictionary or use Japanese TTS), so it would be nice for the user to allow turning on/off. For example, non-native Japanese language users prefer reading numbers in their native language such as English, however, it is difficult to listen that for Japanese native users.
  • Sometimes, mixed use of Latin/Non-Latin characters would be natural to be treated as Japanese (the primary language) sentence. Heuristics can be used for such detection and the user may have choice of priority regarding this.

I think such requirements are because of Japanese TTS and symbol dictionary, which already covers wider ranges of Unicode characters by historical reasons.

If such requirement is only for Japanese users, I will work around only for Japanese.
However, I would like to hear from other language users who have similar requirements.

@nvaccessAuto
Copy link
Author

Comment 27 by jteh on 2015-10-29 00:35
Note that switching to specific voices and synthesisers for specific languages is not meant to be covered here. We'll handle that separately, as among other things, it depends on speech refactor (#4877).

@nvaccessAuto
Copy link
Author

Comment 28 by nishimotz on 2015-10-29 03:01
In Japan, there are some users of Vocalizer for NVDA.

https://vocalizer-nvda.com/docs/en/userguide.html#automatic-language-switching-settings

I am asking them to the usage of this functionality.

As far as I heard, automatic language switching based on content attribute and character code should be separately disabled for Japanese language users.

@nvaccessAuto
Copy link
Author

Comment 29 by jteh (in reply to comment 28) on 2015-10-29 03:09
Replying to nishimotz:

In Japan, there are some users of Vocalizer for NVDA.

As far as I heard, automatic language switching based on content attribute and character code should be separately disabled for Japanese language users.

To clarify, do you mean that these users disable language detection (using characters), but leave language switching for author-specified language enabled? Or are you saying the reverse? Or are you saying that different users have different settings, but all agree both need to be toggled separately? How well doe sthe Vocalizer language detection implementation work for Japanese users?

For what it's worth, I'm starting to think we should allow users to disable language detection (i.e. using characters) separately. At the very least, it provides for a workaround if our language detection code gets it wrong. I'm not convinced it is necessary to separately disable author-specified language switching, though. If you disagree, can you explain why?

@nvaccessAuto
Copy link
Author

Comment 30 by nishimotz on 2015-10-29 03:51
Author-specified language switching is useful for users of multilingual synthesizers, however it should be disabled in some cases.

For example, if a synthesizer supports English and Japanese, and if the actual content of a web site is written in Japanese characters, and the element is incorrectly attributed as lang='en', the content cannot be accessed at all, without turning off the author-specified language switching.
Such websites have been reported by the NVDA users in Japan.

I am now investing the implementation of Vocalizer language detection by myself, however, I heard that they are only useful for working with multilingual materials.

@nvaccessAuto
Copy link
Author

Comment 31 by nishimotz on 2015-10-29 12:41
As far as I have investigated, Vocalizer driver 3.0.12 covers various needs of Japanese NVDA users.

The important feature is:
"Ignore numbers and common punctuation when detecting text language."
Without this, automatic language detection based on characters is difficult to use with Japanese TTS.

By the way, it would be nice to allow disabling "language switching for author-specified language" and enabling "detect text language based on unicode characters" in some cases.
Vocalizer for NVDA does not allow this so far.

For example, Microsoft Word already has ability of content language detection based on character code.
For choosing visual appearance such as display font, this works very well.
However, it would be very difficult to understand if NVDA voice languages are switched by such language attributes, because Japanese sentence usually contains half-width numbers or symbols and full-shape Japanese characters. To be correctly pronounced, they should be sent to Japanese TTS simultaneously.

I am now asking to some friends regarding this, but it seems Japanese users of Microsoft Word cannot use the language switching of NVDA because of this.

@nvaccessAuto
Copy link
Author

Comment 32 by James Teh <jamie@... on 2015-11-02 05:30
In commit 2bba21c:
Revert "NVDA now attempts to automatically detect the language of text to enable automatic language switching even if the author has not specified the language of the text. See the Language Detection section of the User Guide for details."

This is causing problems for quite a few languages and needs some additional work before it is ready.
This reverts commits 60c25e8 and 72f8514.
Re #2990.

@nvaccessAuto
Copy link
Author

Comment 33 by jteh on 2015-11-02 05:31
Changes:
Removed labels: incubating

@nvaccessAuto
Copy link
Author

Comment 34 by mohammed on 2015-11-04 16:00
hi.

it'd be good if people here can try the automatic language implementation in the new ad-on from Codefactory. for me it works if I choose an English voice from NVDA's voice settings dialog box. the only annoyance for me is that I hear punctuation marks with the Arabic voice regardless of "Trust voice's language when processing characters and symbols" state.

Jamie, can we probably make this functionality that has been reverted available as an ad-on? because for me, it is the most successful implementation where my primary language is English and Arabic is a secondary. it worked perfectly for me.

@nvaccessAuto
Copy link
Author

Comment 35 by jteh (in reply to comment 34) on 2015-11-04 22:24
Replying to mohammed:

it'd be good if people here can try the automatic language implementation in the new ad-on from Codefactory.

Do you mean that the Code FActory add-on includes it's wn language detection or do you mean you were trying an NVDA next build which included this functionality (before it was reverted)? I assume the second, but just checking.

Jamie, can we probably make this functionality that has been reverted available as an ad-on?

Unfortunately, no; it needs to integrate quite deeply into NVDA's speech code. However, work on this isn't being abandoned. It just needs more work before it's ready for wide spread testing again.

@jcsteh
Copy link
Contributor

jcsteh commented Aug 15, 2017 via email

@dineshkaushal
Copy link
Contributor

dineshkaushal commented Aug 16, 2017 via email

@nishimotz
Copy link
Contributor

Original code treats numbers as Common category.
Because detectScript() ignores Common category, the language code of digit numbers will be same as the preceding characters.
For example, even if Japanese has higher priority, "Excel 2016" is spoken in English to the end.
It is difficult to understand for Japanese language users.

My modification treats digit numbers, for all languages, as their native script, so the preferred language priority is respected.
For example, if Japanese has higher priority, "Excel" is spoken in English and "2016" is in Japanese.
This is much easier to understand.

@dineshkaushal
Copy link
Contributor

dineshkaushal commented Aug 19, 2017 via email

@nishimotz
Copy link
Contributor

Use of default language sounds good, however, I found an issue with your new revision.

setup:

  • Windows 10 Japanese (English available as additional language)
  • NVDA General settings > langauge : en (English)
  • NVDA preferred language : empty
  • NVDA Synthesizer : OneCore voice

procedure:

  • open NVDA menu > Preferences
  • move to "Windows 10 OCR"
  • expected : English voice "Windows ten o c r"
  • actual : English voice "Windows", Japanese "juu (ten in Japanese)", English "o c r"

@nishimotz
Copy link
Contributor

Tests are working as expected.

The second parameter of detectLanguage() is given in speech.py.
The locale value is used as default language of language detector.

However, if automatic language detection is enabled at the NVDA voice setting, locale value is set to the synthesizer's default language.
If Microsoft David is selected, locale is set to 'en_us.'
If Microsoft Ichiro is selected at the voice setting, locale is set to 'ja_jp,' even NVDA general setting is set to English.
As the result, if English is set to NVDA language, number is spoken in Japanese.

Am I correct?
Is that the expected behavior?

@nishimotz
Copy link
Contributor

I have learned more about your code.
I am still not sure how voice language (aka default language) and prerefenres should be used.
For example, this test, written by me, fails.
It is because second parameter of detectLanguage has higher priority than preferred languages, so Number always respects the voice language.
Is it relevant or not?

	def test_case1(self):
		combinedText = u"Windows 10 OCR"
		config.conf["languageDetection"]["preferredLanguages"] = ("ja",)
		languageDetection.updateLanguagePriorityFromConfig()
		detectedLanguageSequence = languageDetection.detectLanguage(combinedText, "en_US")
		self.compareSpeechSequence(detectedLanguageSequence, [
			LangChangeCommand("en"),
			u"Windows ",
			LangChangeCommand("ja"),
			u"10 ",
			LangChangeCommand("en"),
			u"OCR"
		])
		config.conf["languageDetection"]["preferredLanguages"] = ()
		languageDetection.updateLanguagePriorityFromConfig()

@dineshkaushal
Copy link
Contributor

dineshkaushal commented Aug 21, 2017 via email

nishimotz added a commit to nishimotz/nvda that referenced this issue Aug 21, 2017
@nishimotz
Copy link
Contributor

Thank you for clarifications regarding preferences.

I made new pull request which only adds tests regarding Japanese.

dineshkaushal added a commit to nvda-india/nvda that referenced this issue Aug 23, 2017
@dineshkaushal
Copy link
Contributor

dineshkaushal commented Aug 23, 2017 via email

@nishimotz
Copy link
Contributor

So far, Japanese language users can accept the behavior of current implementation, I think.

@mohdshara
Copy link

could you summarize what work needs to be done before you consider send this as a BR to be reviewed? For Arabic this works as expected, and it seems this is true for Japanese too.

@dineshkaushal
Copy link
Contributor

dineshkaushal commented Nov 23, 2017 via email

@zstanecic
Copy link
Contributor

zstanecic commented Nov 23, 2017 via email

@dineshkaushal
Copy link
Contributor

dineshkaushal commented Nov 23, 2017 via email

@zstanecic
Copy link
Contributor

zstanecic commented Nov 23, 2017 via email

@feerrenrut
Copy link
Contributor

Yes, it's now too late for this change to go into 2017.4. This is perhaps best anyway, the associated PR ( #7629 ) is a large change, which will take some time to review and given the nature of the change, it will be good for many people to use it via master and next builds before it goes into a release

@dineshkaushal
Copy link
Contributor

dineshkaushal commented Nov 27, 2017 via email

@Adriani90
Copy link
Collaborator

@dineshkaushal are you still considering to continue your work on this? It would be highly appreciated. Since there has been put a lot of work in that PR, it would be really too bad if this is discontinued. Now that NVDA has been migrated to Python 3, I guess the PR is not compatible anymore.

@Adriani90
Copy link
Collaborator

cc: @mltony

@ruifontes
Copy link
Contributor

I think we should implement this...

@mltony
Copy link
Contributor

mltony commented Apr 1, 2023

I already implemented this feature in Tony's enhancements add-on. However for NVDA core I would argue that we can take this idea a step further and make use of a language detection library in order to distinguish languages properly, e.g. distinguishing English from German, which is not possible with just Unicode character analysis. VoiceOver can already do this. My cursory googling revealed multiple options available:
https://towardsdatascience.com/4-python-libraries-to-detect-english-and-non-english-language-c82ad3efd430
I vaguely remember seeing someone has published an add-on for this on nvda-addons mailing list a while ago, but not sure if it's still around.
I am currently too busy to work on this, but early next year I will have a few months off from work, so if nobody implements this feature by then - I will consider implementing this myself - if NVDA devs don't mind.

@ruifontes
Copy link
Contributor

Hello!

This is a big message...

In a conversation with mohammad suliman mohmad.s93@gmail.com:

mohammad suliman wrote:
delighted to announce that we are working on reintroducing the magnificent
work done by Dinesh Caushel in pull request #7629. The PR was closed by
lack of activity, and we wish to introduce a new one with improvements, and
while taking into account the requests for changes by Reef on the
previous PR also.

Good! I can cooperate in several tasks, but not coding, since my skills
are far away from yours!

You wrote:

First, we want to highlight that this PR is very needed for us
multilingual users. Last release, the add-on most of our community relied
on stopped working with regards to auto language switching, so some of us
opted to not update to the new version until the add-on is fixed, and
unfortunately some migrated to use other screen readers due to this kind of
issues. What we are trying to convey is that the feature is very helpful
for us multilingual users, so hopefully NV Access will triage it
accordingly.

Yes, I know that and we try to make our Vocalizer NVDA compatible as soon
as possible!

You wrote:

That means that if NVDA encounters a specific language, let's say Hebrew
for the sake of the discussion, then it will continue to speak using this
language including letters, symbols and punctuations, numbers, and emoji
also using this language

   - We think that this behavior is the prefered one for most users, but
   we are not sure that whether we need to make this behavior
   configurable using checkboxes in the interface, which will enable the user
   to choose whether symbols, numbers, emojis, and so on needs to be spoken
   using the surrounding text language or the default one

We have choosen the second way, making it configurable, since many users
prefered to use hebrew numbers and symbols even if the text is in english...

You wrote:
Regarding the interface, we propose the following:

   - A new panel will be created for language detection feature, and it
   will be added to the category list in Preferences of course
   - As before, the panel will include the following components:
      - A list for the preferred languages for the user, where the order
      of the languages in the list is the order in which the mechanism will
      prioritize languages
      - A buttons for moving languages up and down in the list
      - A buttons for adding and removing languages from the list

As we have in Vocalizer, I will suggest a combobox to select the voice to
use...

You wrote:

   - We propose the following components to be added also:
      - A combobox for auto language switching with the following 3
      options:
         - Off (auto language switching is disabled)
         - onn (switch languages according to document language
         properties)
         - Advanced (switch languages using Unicode character properties
         as well as document language properties)

I disagree, and suggest 4 options, including:

switch languages using Unicode character propertiesonly

This is because we found on the web a lot of pages coded as using english,
when they really are in portuguese, spanish and so on...

You wrote:
We want to highlight also that we kredit most of this work to Dinesh
Caushil who has done a great and hard work on this task, and it hurts that
the work hasn't been included in NVDA yet. The ideal scenario would be that
Dinesh completes this work, but as said before that the PR was closed due
to lack of activity, and we need this feature so much, so we decided to
complete Dinesh's way.

If you want also to get some coding logic or GUI from Vocalizer
Expressive, feel free to do it!

And, finally, one suggestion:

Why not use, after the language selection through the character set, one
feature to try to get the correct language through a package named
langDetect?

I have tried several similar tools, and this one proved to be the fastest
and reliable to use...

With more than 4 words the results are almost perfect...

And it is easy to use in NVDA. It can get only the most probable language
or a set of, I think, 3 most probable languages...

Here a small add-on I made to test..:

https://www.dropbox.com/s/3jesk88koae35sg/languageDetect_1.0_Gen.nvda-addon?dl=1

.

I could not understood totally the speech module to try to include this in
our language switching mechanism...

The commands are:

NVDA+Shift+l": "getLang",
NVDA+Control+Shift+l": "getLangs",

Sorry by writing in private, but I think is more produtive...

Best regards,

Rui Fontes
Tiflotecnia, Lda and NVDA portuguese team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked/needs-code-review component/speech enhancement p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority triaged Has been triaged, issue is waiting for implementation.
Projects
None yet
Development

No branches or pull requests