-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-24665: Add CJK support in textwrap by default. #5649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
width = 0 | ||
pos = 0 | ||
for char in text: | ||
width += 2 if east_asian_width(char) in {'F', 'W'} else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why inlining _len(), I don't have seen performance issues and it's less readable (less pythonic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know where to break once you have the whole value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello I was reading too fast. In my version there's the _wide
boolean function.
So here width += _wide(char) + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And _len
is just return sum(2 if _wide(char) else 1 for char in text)
with no performance issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More pythonic, DRY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't bet on the performances, calling _len from _slice adds two functions calls per character (one to _len and one to sum). In one case I'm doing it on a character, and in the other case in a whole string. Yes I could also factorize this ternary to a third function, but I don't find it more readable.
width += 2 if east_asian_width(char) in {'F', 'W'} else 1 | ||
if width > index: | ||
break | ||
pos += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why note use enumerate(), it's less readable (less pythonic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it does not works with enumerate as the last incrementation were not done. I do not remember which case exactly but if you run the unit test you'll spot it easily, it was failing, I'll do if needed but can't right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interested in that, the code was strongly tested for txt2tags and don't catch this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your initial implementation was working thanks to your if cjk_len(text) <= index: return text, ''
fixing the special case explicitly, I may have tried to avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Explicit is better than implicit." but the more important is that both solutions are correct.
Don't see my author credit |
And you miss the |
45fd84d
to
4623375
Compare
Co-authored-by: Florent Gallaire <[email protected]>
4623375
to
57b2882
Compare
You're right! And trying to split a wide character yield to an infinite loop.
Gladly fixed and co-authored you. |
Thanks @JulienPalard, I'm so happy ! I had almost lost hope to see this issue fixed. |
if self.width <= 0: | ||
raise ValueError("invalid width %r (must be > 0)" % self.width) | ||
elif self.width == 1 and _width(text) > len(text): | ||
raise ValueError("invalid width 1 (must be > 1 when CJK chars)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a more complex solution:
elif self.width == 1 and (sum(self._width(chunk) for chunk in chunks) >
sum(len(chunk) for chunk in chunks)):
It throws the exception earlier, but it's probably not absolutely necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change I request is that this be closed because it is conceptually wrong. Textwrap works in terms of abstract 'characters' (codepoint), not physical units. I will explain this on the issue.
Aside from that, 2 is the wrong number to add, as 'double-width' characters are not actually twice as wide as fixed-pitch Ascii chars of the same height. See the issue for this as well.
When you're done making the requested changes, leave the comment: |
Related to:
https://bugs.python.org/issue24665