New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Associating a group of symbols (or any symbols in a particular group) with one pronunciation under symbols.dic #2676
Comments
Comment 1 by jteh (in reply to comment description) on 2012-09-19 08:29 Replying to nvdakor:
Why is regex a problem? It's not particularly difficult to do this with a regex. You just enclose the symbols in square brackets; e.g. ![abc]. The only disadvantage is that you can only have 90 or so complex symbols.
One problem with this is that we'd have to escape {. For example, to match just {, the user would have to do { to distinguish it from a grouping. This would break user symbol files. Also, users would not be able to configure the pronunciation of the individual symbols if they wanted to, although maybe this is intended.
Internally, it won't really be any faster. The individual symbols will still be treated the same way.
Even if we implement symbol grouping, I think symbol grouping with levels adds complexity (both to the code and for translators) with no advantage. If the level is defined separately, other parameters might be defined separately too. There's definitely no speed advantage here. To summarise:
Given the above, the big question is: do you think this is hugely necessary or was it just a nice idea? |
@josephsl As the original author of this ticket, could you please respond to the questions asked in @jcsteh's #2676 (comment)? |
@josephsl, any updates regarding this issue? Does anyone work on it? |
Having this feature in the symbols.dic would really make things simpler and would reduce the complexity of the file. especially for mathematic alphanumeric characters, If I implement these in the symbols.dic (over 900 characters", there are multiple versions of letter a, b, c (doulbe struck, script, etc.) or multiple versions of numbers such as subscript, superscript etc. We don't need the full details of a character's name in the symbols.dic, so for example all multiple versions of the small letter a could be associated to the pronounciation "a" and so on. |
The full name of a character could then be retrieved with the help of an add-on from the Unicode databases etc. on demand (i.e. character information add-on which already exists). |
Reported by nvdakor on 2012-09-19 06:28
Hi,
In symbols.dic for some languages, apart from using regex, is it possible (or would it be possible) to perform the following:
This could be useful for tonal languages such as Korean and Vietnamese which associates one pronunciation for multiple individual symbols. This could also help with faster symbols processing, as a translator doesn't have to define same pronunciation (one per line) for any number of individual symbols.
For example, in the current symbols.dic syntax:
sym(tab)pronunciation(tab)punctuationLevel
And suppose if we wish to assign the letters "a" and "b" to be pronounced as "A":
a(tab)A(tab)none
b(tab)A(tab)none
Following the proposal above, we could say:
{a,b}(tab)A(tab)none
For providing punctuation levels for each individual symbols in the braces, I'd like to propose:
{(A(tab)puncLevel),(b}(tab)A(tab)none
With the priority given to puncLevel for symbols surrounded by parentheses.
Thanks.
The text was updated successfully, but these errors were encountered: