-
Notifications
You must be signed in to change notification settings - Fork 924
make sure we work with unicode #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Byte positions should be fine, and they're also more performant than character indices. |
I think that rustfmt should use char positions for at least the width of the line, or writing error messages in native language leads to the fact that it is necessary to set max_width = 180. |
When using many unicode characters in a line I get a series of:
Even though my line is only 71 characters long. |
Resolves rust-lang#1335. Does not attempt to handle a `\r` not followed by a `\n` nor attempt to handle Unicode intricacies (rust-lang#6) including zero-width or multi-byte characters.
It appears that rust fmt replaces many non-unicode chars (such as |
@czipperz Can you share an example where this happens ? |
Yes. https://github.com/czipperz/rust-comp/blob/b1e3df1f7a04f99e0c0a7bac7f97d715c43ab187/rust-comp-front/src/pos.rs#L78 . It's possible it's a problem with emacs. When I run |
@czipperz looks like it's emacs related indeed. Maybe if you find out how emacs calls rustfmt we can reproduce the bug if it really is on rustfmt side. |
The code in this gist was formatted with rustfmt in the Rust playground. As can be seen, the comments behind the string containing (non-ASCII) unicode characters seem rather haphazardly "aligned". I believe this to be related to this issue, and likely the cause is use of byte lengths instead of unicode string lengths, but do correct me if I'm wrong. (Yes, this particular example might be a bit of a niche case, but I can imagine more legitimate situations where similar issues would arise) |
Running into the same issue as @hcsch with a codebase at work: The unicode strings are causing the comment alignment to go haywire. |
In particular, we use byte positions where we should use char positions in many places. Furthermore, when we do use char positions, we don't check the 'physical' width of the character.
The text was updated successfully, but these errors were encountered: