Better support for handling compound characters in languages such as Korean and Tamil #2791

nvaccessAuto · 2012-11-09T09:28:53Z

Reported by nvdakor on 2012-11-09 09:28
Hi,
Normally, Unicode assigns one code per character. This works well for Latin-based scripts and similar ones such as English. However, there are languages such as Korean and Tamil that uses characters coposed of compound characters - that is, character components are used to make up a single character. A good example is Korean, where a character is composed of an initial conscenant, vowel and zero, one or two final conscenants.
The support for handling such compound characters would be useful for proofreading scenarios (such as pressing Numpad5 three times to spell the word with character descriptions). As of 2012.3, one of two things occur:

In languages such as Korean, the character itself is announced again instead of individual char components.
In languages such as Tamil, only the first part of the char is announced.
There are scripts out there which allows calculation of Unicode values for chars that make up the components of a single compound character. However, a concern would be whether such cases would apply to just one or more languages, with a potential issue being catching possible test cases when dealing with compound characters due to script differences between various languages (which also involves looking up the base Unicode value for chars in a particular language).
In the end, the ideal result would be: when a user presses review current character/word command twice or three times to obtain character descriptions, NvDA would announce individual components of a compound character, instead of translators writing tens of thousands of possible compound char combinations in characterDescriptions.dic, which helps with performance as well.
Thanks.

nvaccessAuto · 2012-11-09T11:12:36Z

Comment 1 by jteh on 2012-11-09 11:12
Questions:

Should this apply to spell word with character descriptions (numpad5 thrice)?
- Mesar thinks yes.
- Wouldn't this be potentially confusing? You wouldn't be able to tell when the characters were really written separately and when they were compound but split by NVDA. For example, in a word with two compound characters each decomposing to two characters, you wouldn't be able to tell whether there were really two characters or four.
I think this might be bad for some other languages. For example, "a acute" might have a specific character description, but this way, the user would hear the character description for "a" and then "acute". The latter probably doesn't even have a description. How do we stop this from negatively impacting those languages?
- Should we make this a configuration option? That kind of sucks for Korean and whatever other languages want this, though.
- I guess we could somehow make this configurable for each locale. This would probably need to be hard-coded.

Technical: You can decompose compound characters with unicodedata.normalize("NFD", compoundChar)

nvaccessAuto · 2014-09-07T08:24:42Z

Comment 3 by blindbhavya on 2014-09-07 08:24
Hi.
What is the keyboard to spell current word or current character in NVDA laptop keyboard layout?
A similar issue was fixed for v 2014.3, but I am saying similar because I am not sure whether both were the same and therefore I want to test this on my laptop without a numeric keypad.

nvaccessAuto · 2014-09-25T23:12:57Z

Comment 5 by jteh (in reply to comment 3) on 2014-09-25 23:12
Replying to blindbhavya:

What is the keyboard to spell current word or current character in NVDA laptop keyboard layout?

NVDA+control+. and NVDA+control+. See section 5.5 of the User Guide.

A similar issue was fixed for v 2014.3, but I am saying similar because I am not sure whether both were the same and therefore I want to test this on my laptop without a numeric keypad.

They're not the same issue. This one hasn't yet been fixed.

bhavyashah · 2017-08-14T05:06:11Z

@josephsl As the original author of this ticket, could you please respond to the questions asked in @jcsteh's #2791 (comment)?

josephsl · 2017-08-14T05:10:09Z

Hi,
Almost five years later:

Korean: fixed by adding character components in symbols and char descriptions dictionaries thanks to work from prominent leaders of the Korean NVDA community.
Tamil: will need to ask on the translations mailing list.

Thanks.

Adriani90 · 2020-04-25T14:05:03Z

@josephsl could you please ask in the translation lists about the current behavior in NVDA 2020 Betaß There was PR #10550 which might have improved this.

josephsl · 2020-04-25T17:19:19Z

Hi, probably not at this point unless I’m wrong 9I have moved on from Korean translation at this point, but we can ask translators once 2020.1 is released). Thanks.

bhavyashah · 2020-06-23T08:42:23Z

@josephsl Since NVDA version 2020.1 has been released, this is a is a friendly reminder to reconsider #2791 (comment).

josephsl · 2020-06-23T16:58:52Z

Hi,

In this case, it would be better to let Korean users comment on this, as they can tell us if things have improved.

Thanks.

Adriani90 · 2023-02-28T13:56:21Z

@khsbory, @ungjinPark, @dnz3d4c could you please give an update on thi issue as well? how is it working in NVDA 2023.1 Beta?

Adriani90 · 2024-03-24T11:13:54Z

Closing this issue as abandoned, no updates from Korean users. For Tamil language, this is already covered in #1428 I think.

nvaccessAuto added enhancement component/i18n existing localisations or internationalisation labels Nov 10, 2015

feerrenrut added the feature/i18n Internationalization features label Apr 29, 2020

Adriani90 added the Abandoned requested reports or updates are missing since more than 1 year, author or users are not available. label Mar 24, 2024

Adriani90 closed this as completed Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for handling compound characters in languages such as Korean and Tamil #2791

Better support for handling compound characters in languages such as Korean and Tamil #2791

nvaccessAuto commented Nov 9, 2012

nvaccessAuto commented Nov 9, 2012

nvaccessAuto commented Sep 7, 2014

nvaccessAuto commented Sep 25, 2014

bhavyashah commented Aug 14, 2017

josephsl commented Aug 14, 2017

Adriani90 commented Apr 25, 2020

josephsl commented Apr 25, 2020 via email •

edited by feerrenrut

bhavyashah commented Jun 23, 2020

josephsl commented Jun 23, 2020

Adriani90 commented Feb 28, 2023

Adriani90 commented Mar 24, 2024

Better support for handling compound characters in languages such as Korean and Tamil #2791

Better support for handling compound characters in languages such as Korean and Tamil #2791

Comments

nvaccessAuto commented Nov 9, 2012

nvaccessAuto commented Nov 9, 2012

nvaccessAuto commented Sep 7, 2014

nvaccessAuto commented Sep 25, 2014

bhavyashah commented Aug 14, 2017

josephsl commented Aug 14, 2017

Adriani90 commented Apr 25, 2020

josephsl commented Apr 25, 2020 via email • edited by feerrenrut

bhavyashah commented Jun 23, 2020

josephsl commented Jun 23, 2020

Adriani90 commented Feb 28, 2023

Adriani90 commented Mar 24, 2024

josephsl commented Apr 25, 2020 via email •

edited by feerrenrut