Problem of word typing anouncing in kannada language with NVDA ESpeak tts. #4254

nvaccessAuto · 2014-07-06T05:20:19Z

Reported by Siddu on 2014-07-06 05:20
None

nvaccessAuto · 2014-07-06T05:33:33Z

Comment 1 by Siddu on 2014-07-06 05:33
My problem is, I am using kannada language TTS e Speak with NVDA. But while typing Kannada, I am unable to hear, words by word. I mean, After pressing space, NVDA not announcing word by word. I already kept set words onn and letters onn. typing echo option say character and words. By pressing key command of NVDA+2 and NVDA+3. And, this problem is not with english language. English is working good with NVDA ESpeak. There is no problem of letter anouncing in kannada language. Only word anouncing problem. I think, The silence is, prevails if a vowel sign is typed. If I type the words without vowel signs like ka, kha, ga, gha etc. (comma) it says the word. Mean, It says all letters after the vowel signs!

nvaccessAuto · 2014-07-06T05:35:58Z

Comment 2 by Siddu on 2014-07-06 05:35
My problem is, I am using kannada language TTS e Speak with NVDA. But while typing Kannada, I am unable to hear, words by word. I mean, After pressing space, NVDA not announcing word by word. I already kept set words onn and letters onn. typing echo option say character and words. By pressing key command of NVDA+2 and NVDA+3. And, this problem is not with english language. English is working good with NVDA ESpeak. There is no problem of letter anouncing in kannada language. Only word anouncing problem. I think, The silence is, prevails if a vowel sign is typed. If I type the words without vowel signs like ka, kha, ga, gha etc. (comma) it says the word. Mean, It says all letters after the vowel signs!

nvaccessAuto · 2014-07-06T07:04:50Z

Comment 3 by dhankuta on 2014-07-06 07:04
Hi,
This problem is not only concerns Kannada. Same issue prevails in all languages of indic origin having joining characteristics viz. Devanagari (Hindi and Nepali), Bengali, Gujarati, Kannada, Malayalam, Punjabi, Oriya, Tamil, Telugu, Sinhala and few other.
The problem persists Not only in vowel indicating letters but in few other letters with diacritic characteristics.
The key spot of this problem is:
In case of typing a diacritic letter, the event function is hinting that it is a word delimiter and break the appending process of
previously typed characters.
Not only in reporting typed words; The issue is frequently encountered in announcing the texts of many objects including the spell checker.

nvaccessAuto · 2014-07-06T09:00:26Z

Comment 4 by jteh on 2014-07-06 09:00
Can you please provide examples of words that get "broken" incorrectly as well as some words that work correctly? We need this so we can diagnose the problem. Thanks.

nvaccessAuto · 2014-07-06T10:32:36Z

Comment 5 by Siddu on 2014-07-06 10:32
for example, I am pasting some words in kannada here. Kā kha GA gha prasād ಕಾ ಖ GA ghaಪ್ರಸಾದ್

nvaccessAuto · 2014-07-06T10:37:00Z

Comment 6 by Siddu (in reply to comment 4) on 2014-07-06 10:37
Replying
to jteh:

Can you please provide examples of words that get "broken" incorrectly as well as some words that work correctly? We need this so we can diagnose the problem. Thanks.

For example, ಕಾ ಖ GA ghaಪ್ರಸಾದ್

nvaccessAuto · 2014-07-06T12:34:55Z

Comment 7 by Siddu on 2014-07-06 12:34
Ok here is some info about hindi language with NVDA. I think, hindi and kannada are, similar problem. . Pasting. If a vowel (maatra only) is typed after a consonant then NVDA says the typed vowel matra and then the consonant, thinking that the vowel, maatraa is some sort of a space. If in the above observation, the vowel maatraa is replaced by a, complete vowel then NVDA works fine. If the word comprises of only consonants then again NVDA slash E Speak works correctly. If the word contains of only consonants with the ending alphabet as a complete vowel then NVDA slash E Speak works correctly.

nvaccessAuto · 2014-07-06T13:35:09Z

Comment 8 by nvdakor on 2014-07-06 13:35
Hi,
Okay, is this only with eSpeak? Do you have access to other synthesizers which contain Hindi voice? If it is strictly eSpeak, then I guess we need to ask eSpeak developers to take a look at this ticket.

nvaccessAuto · 2014-07-06T13:41:38Z

Comment 9 by Siddu on 2014-07-06 13:41
yes sir. Problem with only ESpeak. Please ask them to see this ticket. I am not using other tts.

nvaccessAuto · 2014-07-06T13:58:18Z

Comment 10 by Siddu on 2014-07-06 13:58
In this comment I will be using Hindi as my medium of, explanation.. As someone rightly said, this issue lies with all languages with indic origin. Not particular for kannada. If a word comprises purely of consonants for e.g. कलम, सम, झलक etc and if the word starts with a vowel but the rest of its composition and if the word starts with a vowel but the rest of its composition. is consonants for e.g. अमर, उचल, ऐनक etc. then everything is perfectly read out. However, when a maatraa (half vowel) is present in the middle or the end of the word there is an error. When a maatraa is typed NVDA says the preceding part of the word for e.g. when the ई is typed in कली, NVDA says कल pause ई. Thereby, one can conclude that NVDA considers the maatraa (half vowels) as a spacebar or as the end of the word.. This should actually not happen. The complete word should be said when the user presses spacebar and not when a maatraa is typed.

nvaccessAuto · 2014-07-06T14:39:21Z

Comment 11 by Siddu on 2014-07-06 14:39
I don't think this issue lies with E Speak. It lies with NVDA. Because NVDA, provides the Speak typed words feature and not E Speak. Experts will be the final judge though. Do tell me if my previous example made sense and if so, will they be fized?

nvaccessAuto · 2014-07-06T22:57:23Z

Comment 12 by jteh on 2014-07-06 22:57
This is definitely not an issue with eSpeak. I suspect the problem is that these characters aren't considered alphanumeric and we currently consider non-alphanumeric characters to be outside of a word. The question is what test we can use instead.

nvaccessAuto · 2014-07-07T05:32:44Z

Comment 13 by dhankuta on 2014-07-07 05:32
hi,
I have highlighted the core spot. Let me clear again.

This issue does not concern to synthesizer.
This is not a language specific.
It occurs in all characters of any language having combining characteristics (diacritic letters).
In navigation (wd_word i.e control+right/left arrows or control+shift+left/right arrows); windows/nvda take These characters neither as word delimiter nor as non-alphanumeric.
But in say character/word functionality; it is taking as word delimiter.
For sited persons; these characters if standalone, appear with a dotted circle. The dotted circle means that there must be an alphabet in the place of the circle. Grammatically, these characters can not be written standalone. They represent some other letters specially vowels if preceded/followed by a consonant. As a result in literature they are address by the word 'sign' but they are purely alphabets not signs.
In Arabic languages too, characters with similar characteristics prevail.
The event handler which determines whether to continue to concatenate the just typed character or not; is wrongly considering that these characters are word delimiters.
I guess the problem lies in speak module in speakTypedCharacters(ch) function. Once I had looked on it but could not understand what this line means:
if ch.isalnum():
Then left the fixing attempt.
Anyway, it is a serious issue.
Him Prasad Gautam.

nvaccessAuto · 2014-07-07T06:35:32Z

Comment 14 by dhankuta on 2014-07-07 06:35
Hi again,
My guess seems right. The spot of problem is the line as I had mentioned.
ch.isalnum():
is the bug.
But can any one expert hint me from where is the function isalnum()?
Is it from python core or from nvda module.
If any one can hint me the exact module isalnum() I will try to fix permanently.

redefining the diacritic characters of Kannada and Nepali as non alphanumeric; I right now temporarily fixed the bug.
However, the code is very primitive. Just as a test, it worked well.

nvaccessAuto · 2014-07-07T11:12:53Z

Comment 15 by Siddu on 2014-07-07 11:12
thankyou him prasad gautam sir. Friends, please reply to his querry.

nvaccessAuto · 2014-07-07T11:19:34Z

Comment 16 by jteh on 2014-07-07 11:19
As I explained in comment:12, we need to figure out what test to use to work out what characters are considered as part of a word. Right now, we only consider alphanumeric characters to be part of a word, but these characters obviously aren't treated as alphanumeric, which I guess makes sense. The question is what test we can use which covers everything nicely.

Btw, the isalnum method is called on a Python unicode object.

nvaccessAuto · 2014-07-07T11:23:44Z

Comment 17 by Siddu on 2014-07-07 11:23
ok sir. I am using unicode format text to type kannada language.

nvaccessAuto · 2014-07-07T13:01:30Z

Comment 18 by Siddu on 2014-07-07 13:01
ok. I am pasting here, kannada unicode text. Read with ESpeak. Text below. ಎಲ್ಲರೊಳಗೊಂದಾಗು ಮಂಕುತಿಮ್ಮ

nvaccessAuto · 2014-07-07T13:50:24Z

Comment 19 by Siddu on 2014-07-07 13:50
kannada unicode, Download This file contains an excerpt from the character code tables and list of character names ... http://www.unicode.org/charts/PDF/U0C80.pdf

nvaccessAuto · 2014-07-08T01:33:40Z

Comment 20 by dhankuta on 2014-07-08 01:33
hi,
I am working in this line.

prepared a list of all characters of concern belonging to those languages which I had mentioned in my first comment.
add few new conditional lines in the sayTypedWord function which will consider these characters as alphanumeric.
Let me finish these two tasks and get feed back from the users of language in concern.
I had already tested in my language and the issue has gone. Now I am Checking in other features.
However,
This option will solve the bug of one spot only, if the same isalnum object is used somewhere else, the issue remains. Finding out all the isalnum object in nvda modules is not possible by me.
What about if I provide all such characters to you? we can add the list of similar characters of rest languages in future.
next way: or provide the spots of isalnum object use to me?

nvaccessAuto · 2014-07-08T02:28:54Z

Comment 21 by jteh on 2014-07-08 02:28
I'd prefer to avoid matching against specific characters. It'd be better to find one or more rules that match all appropriate characters. They aren't alphanumeric, but they must fit into some category or another.

nvaccessAuto · 2014-07-08T03:00:56Z

Comment 22 by dhankuta on 2014-07-08 03:00
Hi,
Within ten minutes, With a tricky idea; I exactly located all the spots where the isalnum objects is used in whole nvda sources.
It is not much. Just speak and textinfo/offset!
Thanks to notepad ++,

Yes, I agree that check of conditionality of each character is improper. That is why I had already said that the code of the first fix is very primitive.
hope I will be able to adopt a proper way.
However, I will try to fix the case first. and share. We will have discussion then. Right now let us pause.
Anyway, the big issue of many languages is resolved!

nvaccessAuto · 2014-07-08T03:37:35Z

Comment 23 by jteh on 2014-07-08 03:37
It seems all of the affected characters have a Unicode category of mark (M). Therefore, we should be able to use Unicode categories instead of isalnum and check for letter (L), mark (M) and number (N).

For reference: Unicode category values
Changes:
Milestone changed from None to next

nvaccessAuto · 2014-07-08T03:53:16Z

Comment 24 by James Teh <jamie@... on 2014-07-08 03:53
In [713d98e]:

Fix incorrect breaking of words at marks such as vowel signs and virama in Indic languages.

speech.speakTypedCharacters and textInfos.offsets.find{Start,End}OfWord were using unicode.isalnum to check for characters that are part of a word, but this only covers alphanumeric characters. The marks in question aren't alphanumeric, but should still be considered part of a word.
Therefore, use the Unicode category of the character and include letters, marks and numbers.
Re #4254.

nvaccessAuto · 2014-07-08T04:11:21Z

Comment 25 by jteh on 2014-07-08 04:11
It'd be great if affected users could test this try build and report whether it fixes the problem. If it does, I'll merge it into next for wider testing. Thanks.

nvaccessAuto · 2014-07-08T07:49:52Z

Comment 26 by dhankuta on 2014-07-08 07:49
hi Jamie,
I tested in three languages.
the fix is ok.
No more unusual breaking in saying typed word.
Go ahead om merging.
I am in contact with users of the rest languages. Will report if any issue arose.
Thanks a lot.

nvaccessAuto · 2014-07-08T07:55:45Z

Comment 27 by James Teh <jamie@... on 2014-07-08 07:55
In [66ab0df]:

Merge branch 't4254' into next

Incubates #4254.

Changes:
Added labels: incubating

nvaccessAuto · 2014-07-14T09:08:44Z

Comment 29 by Siddu on 2014-07-14 09:08
finally kannada language problem got solve! Yes, after checking master version of NVDA now, NVDA is anouncing word by word. I am thankful to all developers! For solving this problem. Ticket#4254. Special thanks to him gautam prasad for making new development. I am happy now. Test master version of NVDA is available to download from, http://dl.dropboxusercontent.com/s/ah109slknykpgt0/nvda_source-master-09b564e.exe

nvaccessAuto · 2014-07-14T09:18:16Z

Comment 30 by Siddu on 2014-07-14 09:18
finally kannada language problem got solve! Yes, after checking master version of NVDA now, NVDA is anouncing word by word. I am thankful to all developers! For solving this problem. Ticket#4254. URL to visit, http://community.nvda-project.org/ticket/4254 Special thanks to him gautam prasad sir. From Nepal. Great man! for making new development. I am happy now. Test master version of NVDA is available to download from, http://dl.dropboxusercontent.com/s/ah109slknykpgt0/nvda_source-master-09b564e.exe

nvaccessAuto · 2014-07-15T03:44:20Z

Comment 31 by jteh on 2014-07-15 03:44
My fix has already been merged into next for wider testing, so there's no need to use a custom build.

nvaccessAuto · 2014-08-05T00:44:05Z

Comment 32 by James Teh <jamie@... on 2014-08-05 00:44
In [7ffeb00]:

Fix incorrect breaking of words at marks such as vowel signs and virama in Indic languages.

speech.speakTypedCharacters and textInfos.offsets.find{Start,End}OfWord were using unicode.isalnum to check for characters that are part of a word, but this only covers alphanumeric characters. The marks in question aren't alphanumeric, but should still be considered part of a word.
Therefore, use the Unicode category of the character and include letters, marks and numbers.
Fixes #4254.

Changes:
Removed labels: incubating
State: closed

nvaccessAuto · 2014-08-05T00:45:34Z

Comment 33 by jteh on 2014-08-05 00:45
Changes:
Milestone changed from next to 2014.3

nvaccessAuto added bug component/i18n existing localisations or internationalisation labels Nov 10, 2015

nvaccessAuto assigned jcsteh Nov 10, 2015

nvaccessAuto added this to the 2014.3 milestone Nov 10, 2015

nvaccessAuto closed this as completed Nov 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem of word typing anouncing in kannada language with NVDA ESpeak tts. #4254

Problem of word typing anouncing in kannada language with NVDA ESpeak tts. #4254

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 14, 2014

nvaccessAuto commented Jul 14, 2014

nvaccessAuto commented Jul 15, 2014

nvaccessAuto commented Aug 5, 2014

nvaccessAuto commented Aug 5, 2014

Problem of word typing anouncing in kannada language with NVDA ESpeak tts. #4254

Problem of word typing anouncing in kannada language with NVDA ESpeak tts. #4254

Comments

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 6, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 7, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 8, 2014

nvaccessAuto commented Jul 14, 2014

nvaccessAuto commented Jul 14, 2014

nvaccessAuto commented Jul 15, 2014

nvaccessAuto commented Aug 5, 2014

nvaccessAuto commented Aug 5, 2014