Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for handling compound characters in languages such as Korean and Tamil #2791

Closed
nvaccessAuto opened this issue Nov 9, 2012 · 11 comments
Labels
Abandoned requested reports or updates are missing since more than 1 year, author or users are not available. component/i18n existing localisations or internationalisation enhancement feature/i18n Internationalization features

Comments

@nvaccessAuto
Copy link

Reported by nvdakor on 2012-11-09 09:28
Hi,
Normally, Unicode assigns one code per character. This works well for Latin-based scripts and similar ones such as English. However, there are languages such as Korean and Tamil that uses characters coposed of compound characters - that is, character components are used to make up a single character. A good example is Korean, where a character is composed of an initial conscenant, vowel and zero, one or two final conscenants.
The support for handling such compound characters would be useful for proofreading scenarios (such as pressing Numpad5 three times to spell the word with character descriptions). As of 2012.3, one of two things occur:

  • In languages such as Korean, the character itself is announced again instead of individual char components.
  • In languages such as Tamil, only the first part of the char is announced.
    There are scripts out there which allows calculation of Unicode values for chars that make up the components of a single compound character. However, a concern would be whether such cases would apply to just one or more languages, with a potential issue being catching possible test cases when dealing with compound characters due to script differences between various languages (which also involves looking up the base Unicode value for chars in a particular language).
    In the end, the ideal result would be: when a user presses review current character/word command twice or three times to obtain character descriptions, NvDA would announce individual components of a compound character, instead of translators writing tens of thousands of possible compound char combinations in characterDescriptions.dic, which helps with performance as well.
    Thanks.
@nvaccessAuto
Copy link
Author

Comment 1 by jteh on 2012-11-09 11:12
Questions:

  • Should this apply to spell word with character descriptions (numpad5 thrice)?
    • Mesar thinks yes.
    • Wouldn't this be potentially confusing? You wouldn't be able to tell when the characters were really written separately and when they were compound but split by NVDA. For example, in a word with two compound characters each decomposing to two characters, you wouldn't be able to tell whether there were really two characters or four.
  • I think this might be bad for some other languages. For example, "a acute" might have a specific character description, but this way, the user would hear the character description for "a" and then "acute". The latter probably doesn't even have a description. How do we stop this from negatively impacting those languages?
    • Should we make this a configuration option? That kind of sucks for Korean and whatever other languages want this, though.
    • I guess we could somehow make this configurable for each locale. This would probably need to be hard-coded.

Technical: You can decompose compound characters with unicodedata.normalize("NFD", compoundChar)

@nvaccessAuto
Copy link
Author

Comment 3 by blindbhavya on 2014-09-07 08:24
Hi.
What is the keyboard to spell current word or current character in NVDA laptop keyboard layout?
A similar issue was fixed for v 2014.3, but I am saying similar because I am not sure whether both were the same and therefore I want to test this on my laptop without a numeric keypad.

@nvaccessAuto
Copy link
Author

Comment 5 by jteh (in reply to comment 3) on 2014-09-25 23:12
Replying to blindbhavya:

What is the keyboard to spell current word or current character in NVDA laptop keyboard layout?

NVDA+control+. and NVDA+control+. See section 5.5 of the User Guide.

A similar issue was fixed for v 2014.3, but I am saying similar because I am not sure whether both were the same and therefore I want to test this on my laptop without a numeric keypad.

They're not the same issue. This one hasn't yet been fixed.

@nvaccessAuto nvaccessAuto added enhancement component/i18n existing localisations or internationalisation labels Nov 10, 2015
@bhavyashah
Copy link

@josephsl As the original author of this ticket, could you please respond to the questions asked in @jcsteh's #2791 (comment)?

@josephsl
Copy link
Collaborator

Hi,
Almost five years later:

  • Korean: fixed by adding character components in symbols and char descriptions dictionaries thanks to work from prominent leaders of the Korean NVDA community.
  • Tamil: will need to ask on the translations mailing list.

Thanks.

@Adriani90
Copy link
Collaborator

@josephsl could you please ask in the translation lists about the current behavior in NVDA 2020 Betaß There was PR #10550 which might have improved this.

@josephsl
Copy link
Collaborator

josephsl commented Apr 25, 2020 via email

@feerrenrut feerrenrut added the feature/i18n Internationalization features label Apr 29, 2020
@bhavyashah
Copy link

@josephsl Since NVDA version 2020.1 has been released, this is a is a friendly reminder to reconsider #2791 (comment).

@josephsl
Copy link
Collaborator

Hi,

In this case, it would be better to let Korean users comment on this, as they can tell us if things have improved.

Thanks.

@Adriani90
Copy link
Collaborator

@khsbory, @ungjinPark, @dnz3d4c could you please give an update on thi issue as well? how is it working in NVDA 2023.1 Beta?

@Adriani90 Adriani90 added the Abandoned requested reports or updates are missing since more than 1 year, author or users are not available. label Mar 24, 2024
@Adriani90
Copy link
Collaborator

Closing this issue as abandoned, no updates from Korean users. For Tamil language, this is already covered in #1428 I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Abandoned requested reports or updates are missing since more than 1 year, author or users are not available. component/i18n existing localisations or internationalisation enhancement feature/i18n Internationalization features
Projects
None yet
Development

No branches or pull requests

5 participants