New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read HTML entities, unicode characters, other symbols #3805
Comments
Comment 1 by paulbohman on 2014-01-22 23:58 Latin 1-Supplement:
Arrows:
Mathematical operators:
Geometric shapes:
Miscellaneous symbols:
Dingbats
Supplemental arrows-A:
Miscellaneous symbols and arrows:
Emoji:
|
Comment 2 by jteh on 2014-01-23 00:27
This is likely to be a pretty time consuming job, but it is just a matter of editing a file with tab separated values and following the documentation, so marking as goodForNewDev. |
Comment 3 by paulbohman on 2014-01-23 00:35 And believe me, I know how time consuming it is. Just writing the blog entry that I wrote took many hours of typing and testing. As far as HTML entities go, I'm not sure where the most authoritative source is. This one seems pretty inclusive: http://dev.w3.org/html5/html-author/charref Of these, my personal highest priorities would be anything that has to do with math, plus arrows, plus things like section, paragraph, and superscripts. There may be a few others that I'd like, but those come to mind. |
Comment 4 by paulbohman on 2014-01-23 00:45 |
Comment 5 by driemer.riemer@... on 2014-01-23 02:58 |
Comment 6 by briang1 on 2014-01-23 08:32 Also, what happens, for example in Usenet newsgroups where I think greater than or less than is used to denote quoted text. I have made sure these are only spoken in all, but it might well be that one needs to say, quoted text the first time one encounters the symbol in a news client. Also presumably, these will need manually editing for each instance for all the languages in nvda by the translators. There are a heck of a lot. |
Comment 7 by nvdakor on 2014-01-23 10:15 |
Comment 8 by nishimotz on 2014-01-23 12:32 |
Comment 9 by driemer.riemer@... on 2014-01-23 22:38 |
Comment 10 by jteh (in reply to comment 8) on 2014-01-23 23:00
As I understand it, the symbol table should be able to handle this; it can handle multiple codepoints in the same entry. The issue is more whether APIs can handle passing multi-codepoint characters to us. Some work might be needed on speak spelling/character descriptions. |
Comment 11 by jteh (in reply to comment 9) on 2014-01-23 23:01
Ideally, a Git branch. Failing that, a patch. As a last resort, just the modified file, but please tell us which revision of the code it is based on. Always base changes on the master branch. Thanks. |
Comment 12 by JohnHoltRipley on 2014-02-26 13:54 |
Comment 13 by nvdakor on 2014-03-13 23:18 |
Comment 14 by nvdakor on 2014-03-13 23:22
|
Comment 15 by driemer.riemer@... on 2014-08-17 04:52 |
Comment 18 by ajirving on 2014-10-23 18:17 |
Comment 19 by jteh (in reply to comment 18) on 2014-10-23 21:49
The problem with this is that it is English only and cannot be localised. |
Comment 20 by siddhartha_iitd on 2014-12-01 07:08 For some characters, no standard description could be found. So, such characters are included without any description. If we want to retain such characters, we might have to go with non-standard descriptions. Please check the branch in_t3805 by following the below mentioned url: |
Comment 21 by jteh on 2014-12-01 10:04 |
Comment 22 by siddhartha_iitd (in reply to comment 21) on 2014-12-01 11:24
Thanks for quick reply! The characters without any standard description are removed from symbols.dic file. The updated symbols.dic file is available in branch '''in_t3805''' at following url: |
Comment 24 by driemer.riemer@... on 2014-12-01 15:05 |
Comment 25 by jteh (in reply to comment 24) on 2014-12-01 21:56
I suggest we start with the changes proposed in comment:20. If you're interested, it'd be good if you can take a look at that.
Blocking doesn't make sense. This is about the default experience. Customising is a complementary feature, but not a requirement for improving the default experience. |
Comment 26 by dineshkaushal on 2014-12-06 07:14 |
Comment 27 by jteh (in reply to comment 26) on 2014-12-06 10:19
The Unicode names are English only and overly verbose for screen reader use. |
Comment 28 by dineshkaushal on 2015-03-26 11:17 Is this branch going to be part of 15.2? |
Comment 29 by siddhartha_iitd on 2015-03-26 14:17 |
Comment 30 by jteh on 2015-04-14 07:11
|
Comment 31 by Michael Curran <mick@... on 2015-04-30 04:57
Changes:
|
Comment 32 by Michael Curran <mick@... on 2015-06-19 19:11
Changes:
|
Comment 33 by mdcurran on 2015-06-19 19:15 |
Will definitely look into this more. You have a few kinds of symbols.
Some, like the math and fraction symbols, are necessary to be read in
context. Others are more ornamental. The other thing to think about is
if something should happen if there is no label. Currently, if you arrow
over a character without a label, no speech output is given. Should NVDA
speak Unicode 205E or give some indication that indeed a key press was
executed?
|
There are indeed a lot of synths which try to be too clever and speak stuff
that they should not like expand abbreviations not intended in the text, so
that is entirely possible.
As I have said before I think, I have massively modified my standard
symbols spoken output to reflect the UK english general usage, but I'd not
want to impose it on anyone who did not want it by default.
|
Added support for unicode up / down arrows and fraction symbols. Part of issue #3805
@jage9: Do you have an idea about how to bring this further? We now have emoji dictionaries, and loading them takes some time, however it is barely noticeable. The sluggishness in the symbols dialog is covered in a pr, so that shouldn't hold us back from adding additional symbols. I have a branch that implements #9138, and I could add additional symbols to that branch to continue the work on this issue. |
Is there a master list for Unicode symbols similar to what was used for emoji? There are some differences to consider though. |
Not sure if this is an appropriate thread or if I should open a new one. I've discovered that NVDA (2018.4.1) doesn't read the HTML For some additional context, most screen readers will announce "-$200" (using a dash/hyphen) as "minus two hundred dollars", but not Talkback. I'm planning to submit a bug report to them as well, but IMO NVDA is the worse offender here—treatment of hyphens can be argued. Treatment of the minus entity cannot. If the author explicitly used the minus entity then that should be respected by the screen reader. |
Can you post an example of this?
As written in the Email, it speaks for me. What is your punctuation setting?
|
Thanks. The entity is not included in the current dictionary and
definitely should be along with the other math symbols added previously.
-$100 reads correctly, while
−$100 does not read the minus sign regardless of punctuation.
I'm guessing this needs a mod in symbols.dic, though it may be a bit
more complex than adding the symbol. Does this go in the area where
negative numbers are handled currently with the minus sign? We need this
particular minus sign to behave the same way.
Probably need to work with this line
negative number (?<!\w)-(?=[$£€¥.]?\d)
|
@jage9 could be something as simple as this?
PS: For general reading would adding after this after the plus entry suffice?
Never contributed to NVDA before so forgive me if I'm missing something. |
If this only worked with negative numbers (where the symbol is next to the number, as currently happens now with dash and hyphen-minus), then it would still fail basic math (3 − 5 = −2). I would think the minus symbol needs to be standalone, outside of just negative numbers. |
Since we have free time in the world, backtracking to my Jan 10, 2019 comment and wondering if people have thoughts. I'm happy to greatly expand the symbols.dic file for English as I did previously for math, if there isn't a large concern on load time. I'm not sure what this would do for translations or if there is a quicker way. But this is a long-standing feature that could use some closure since many of these symbols are vital in various professions/hobbies. |
here is source, which helps me: |
See also:
|
Please consider enabling NVDA to recognize the playing cards. I include some possible descriptions below. You might have better descriptions. The situations that I came across only need the playing card back and the common 52 playing cards (ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen, and king in the 4 suits of spades, hearts, diamonds, and clubs). I included the other characters because it seemed wrong to deliberately miss them out, and I know so few card games that I am not in a position to judge how often people need those characters. 🂠 playing card |
With NVDA using eSpeak NG with "English (America)", I expected to get "cap a grave" with À and "a grave" with à when reading character by character. Instead of getting the name of the character, I get the sound of the character. This behavior seems different to that for characters with nearby Unicode numbers. Is this what you expect to happen? |
I'm also encountering a similar issue. The "-" character is not read by NVDA + Firefox. However, it is read by VO + Safari. |
Reported by paulbohman on 2014-01-22 23:33
Many HTML entities, unicode characters, and other symbols are not read by default, or at all. When web developers or content writers put these characters or symbols in their content, it's almost always because they're using them to convey some meaning. There are exceptions, of course, when symbols may be used for decorative purposes, but I don't think that's the norm.
As an example, NVDA reads the left and right arrow HTML entities (← and →), but for some reason NVDA doesn't read up arrow or down arrow. When web authors use these characters, it's usually because they are conveying some meaning, like up to the next level, or down a level, or next page or previous page. Or maybe they're using them to explain the NVDA shortcut keys: Control plus alt plus up arrow, for example.
Similarly, symbols like the dagger or double dagger symbol might be used for footnotes. There are a lot of other characters and symbols out there -- and I realize that the magnitude of the lists of characters is an issue -- but in most cases they're used to convey meaning, so I would want them read by default.
For most of them, it's enough to simply say what the character is: "dagger" or "heart" or whatever the symbol is. I wouldn't worry about trying to interpret "I heart you" and changing it to "I love you." Just say what the symbol is.
Blocking #3752
The text was updated successfully, but these errors were encountered: