Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associating a group of symbols (or any symbols in a particular group) with one pronunciation under symbols.dic #2676

Open
nvaccessAuto opened this issue Sep 19, 2012 · 5 comments

Comments

@nvaccessAuto
Copy link

Reported by nvdakor on 2012-09-19 06:28
Hi,
In symbols.dic for some languages, apart from using regex, is it possible (or would it be possible) to perform the following:

  • Create a group of symbols.
  • The group of these individual symbols would be given a single pronunciation.
  • Any individual symbol within this pronunciation group would be using this one pronunciation when spoken.
    This could be useful for tonal languages such as Korean and Vietnamese which associates one pronunciation for multiple individual symbols. This could also help with faster symbols processing, as a translator doesn't have to define same pronunciation (one per line) for any number of individual symbols.
    For example, in the current symbols.dic syntax:
    sym(tab)pronunciation(tab)punctuationLevel
    And suppose if we wish to assign the letters "a" and "b" to be pronounced as "A":
    a(tab)A(tab)none
    b(tab)A(tab)none
    Following the proposal above, we could say:
    {a,b}(tab)A(tab)none
    For providing punctuation levels for each individual symbols in the braces, I'd like to propose:
    {(A(tab)puncLevel),(b}(tab)A(tab)none
    With the priority given to puncLevel for symbols surrounded by parentheses.
    Thanks.
@nvaccessAuto
Copy link
Author

Comment 1 by jteh (in reply to comment description) on 2012-09-19 08:29
Thanks for the suggestion. My thoughts:

Replying to nvdakor:

In symbols.dic for some languages, apart from using regex,

Why is regex a problem? It's not particularly difficult to do this with a regex. You just enclose the symbols in square brackets; e.g. ![abc]. The only disadvantage is that you can only have 90 or so complex symbols.

{a,b}(tab)A(tab)none

One problem with this is that we'd have to escape {. For example, to match just {, the user would have to do { to distinguish it from a grouping. This would break user symbol files. Also, users would not be able to configure the pronunciation of the individual symbols if they wanted to, although maybe this is intended.

This could also help with faster symbols processing, as a translator doesn't have to define same pronunciation (one per line) for any number of individual symbols.

Internally, it won't really be any faster. The individual symbols will still be treated the same way.

For providing punctuation levels for each individual symbols in the braces, I'd like to propose:

{(A(tab)puncLevel),(b}(tab)A(tab)none

Even if we implement symbol grouping, I think symbol grouping with levels adds complexity (both to the code and for translators) with no advantage. If the level is defined separately, other parameters might be defined separately too. There's definitely no speed advantage here.

To summarise:

  • This is a fairly significant change that will have to be carefully implemented to avoid breaking user symbol files.
  • There is little to no speed advantage.

Given the above, the big question is: do you think this is hugely necessary or was it just a nice idea?

@bhavyashah
Copy link

@josephsl As the original author of this ticket, could you please respond to the questions asked in @jcsteh's #2676 (comment)?

@Adriani90
Copy link
Collaborator

@josephsl, any updates regarding this issue? Does anyone work on it?

@Adriani90
Copy link
Collaborator

Having this feature in the symbols.dic would really make things simpler and would reduce the complexity of the file. especially for mathematic alphanumeric characters, If I implement these in the symbols.dic (over 900 characters", there are multiple versions of letter a, b, c (doulbe struck, script, etc.) or multiple versions of numbers such as subscript, superscript etc. We don't need the full details of a character's name in the symbols.dic, so for example all multiple versions of the small letter a could be associated to the pronounciation "a" and so on.

@Adriani90
Copy link
Collaborator

The full name of a character could then be retrieved with the help of an add-on from the Unicode databases etc. on demand (i.e. character information add-on which already exists).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants