Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle non-breaking spaces for indentation indication #5610

Closed
wants to merge 2 commits into from

Conversation

ahicks92
Copy link
Contributor

I've seen this a lot, only just figured out what it is. if you're in the web browser and indentation indication is on, non-breaking spaces cause it to blow up annoyingly. For an example, see this thread, post 6. It causes indentation to be read in Firefox, Thunderbird, etc. as "space space space space..."
If someone says where the indentation function is, I can probably submit a PR. I imagine this isn't hard.

@jcsteh
Copy link
Contributor

jcsteh commented Dec 15, 2015 via email

@ahicks92
Copy link
Contributor Author

There's two things I'd like to do to this that I think make it better, though it looks like best may not be achievable:

  • Replace all non-breaking space with spaces. This will break when in locales that mix it with non-space characters, but will at least begin doing something sane when it's mixed with spaces. I'm obviously biased, but this is the most common place I see them.
  • Group indentation characters by type. "Space tab space tab space tab" is bad, "3 tab 3 space" is better but comes with the downside that you need to check for where the spaces are manually. Of these, I think this one is the most controversial, and it might be worth dropping, but it would at least do something somewhat sane for locales where the default indentation isn't space (if those exist).

@jcsteh
Copy link
Contributor

jcsteh commented Dec 18, 2015

  • Replace all non-breaking space with spaces.

I guess we could do this; it'd just be two passes instead of one, which is fine. We can do this just for indentation text so we're not doing it over the all text; it's only useful for indentation reporting.

  • Group indentation characters by type. "Space tab space tab space tab" is bad, "3 tab 3 space" is better

It can be quite different visually. At least in decent editors (Notepad++ certainly does this), tab doesn't always just get mapped to a fixed number of spaces. Rather, it moves to the next point evenly divisible by that number of spaces. If you do space tab, it will indent the same distance as just tab; i.e. the tab consumes the equivalent of 3 spaces. So, tab space tab space tab looks the same as 3 tabs, but not the same as 3 tab 2 space.

@ahicks92
Copy link
Contributor Author

Everyone says mixing them is a bad idea. If we said "3 space 3 tab mixed" and you care, you could look. That's my thinking anyway. I'd want to stick a pause after mixed if we did it, but honestly replacing non-breaking spaces is probably enough anyway. I'll go ahead and prepare a PR for the first part of this at least.

ahicks92 added a commit to ahicks92/nvda that referenced this pull request Dec 18, 2015
@ahicks92
Copy link
Contributor Author

This works for the first part. I left it open because I'm not sure if we want to do the second or not.
Sorry about the history, I'm not sure why the pull request is including both commits. I used hub and it says that pull-request to issue conversion is deprecated...but only after it does it. I'm assuming this is the bug, and suffice it to say I will not be doing that again.

@@ -421,6 +422,8 @@ def getIndentationSpeech(indentation):
# Translators: This is spoken when the given line has no indentation.
return _("no indent")

#The non-breaking space is semantically a space, so we replace it here.
indentation = RE_NBSP_CONVERT.sub(" ", indentation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously totally cool with using regexps if it allows you to do multiple string passes in one step and thus improve performance, but in this case, is there any advantage of using a regexp over a simple:

indentation = indentation.replace(u"\xa0", " ")

@ahicks92
Copy link
Contributor Author

Yeah, actually. "\xa0".replace(u"\xa0", " ") fails because replace tries to decode the unicode literal to Ascii. Passing the ascii representation to the Unicode version of replace also breaks; in that case, I suspect that it's trying to decode the argument to ascii first, but it's hard to tell. The function's docstring doesn't explicitly say that the input is going to be a Unicode string, so you either have to force it (create 2 temporary strings in quick succession) or do an even uglier if statement as far as I can see. There might be a cleaner way, but a simple replace call seems to break on one of the alternatives no matter how I write it, the if statement switching on types seemed ugly, and you seemed concerned that this be absolutely as fast as possible.
I don't discount the possibility that I missed something, me and Unicode do not mix.

@jcsteh
Copy link
Contributor

jcsteh commented Dec 21, 2015

ah, very good point. However, it seems this fails already for Unicode:

>>> speech.getIndentationSpeech(" \xa0")
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "C:\Users\jamie\src\nvda\source\speech.py", line 438, in getIndentationSpeech
    return " ".join(res)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

Given this and the fact that I can't think of a case when it would ever be passed an ANSI string, just change the docstring to say unicode instead of basestring for the parameter type, which should also make the unicode.replace safe.

@ahicks92
Copy link
Contributor Author

That should do it.

@jcsteh
Copy link
Contributor

jcsteh commented Dec 22, 2015

I didn't see this before, but I see you've based this on (and are requesting merge into) next. For future reference, all pull requests should be based on (and submitted against) master; see the Contributing guidelines. No need to resubmit or anything for this one; I'll take care of it.

jcsteh pushed a commit that referenced this pull request Dec 22, 2015
…s normal spaces. Previously, this could cause announcements such as "space space space" instead of "3 space".

Fixes #5610.
@nvaccessAuto
Copy link

Incubated in 67f080d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants