bpo-24665: Add CJK support in textwrap by default. #5649

JulienPalard · 2018-02-13T01:07:39Z

Related to:

https://bugs.python.org/issue24665

fgallaire · 2018-02-14T23:40:58Z

Lib/textwrap.py

+        width = 0
+        pos = 0
+        for char in text:
+            width += 2 if east_asian_width(char) in {'F', 'W'} else 1


Why inlining _len(), I don't have seen performance issues and it's less readable (less pythonic)

How do you know where to break once you have the whole value?

Hello I was reading too fast. In my version there's the _wide boolean function.
So here width += _wide(char) + 1

And _len is just return sum(2 if _wide(char) else 1 for char in text) with no performance issues

More pythonic, DRY.

I won't bet on the performances, calling _len from _slice adds two functions calls per character (one to _len and one to sum). In one case I'm doing it on a character, and in the other case in a whole string. Yes I could also factorize this ternary to a third function, but I don't find it more readable.

fgallaire · 2018-02-14T23:43:31Z

Lib/textwrap.py

+            width += 2 if east_asian_width(char) in {'F', 'W'} else 1
+            if width > index:
+                break
+            pos += 1


Why note use enumerate(), it's less readable (less pythonic)

Because it does not works with enumerate as the last incrementation were not done. I do not remember which case exactly but if you run the unit test you'll spot it easily, it was failing, I'll do if needed but can't right now.

Interested in that, the code was strongly tested for txt2tags and don't catch this problem.

Your initial implementation was working thanks to your if cjk_len(text) <= index: return text, '' fixing the special case explicitly, I may have tried to avoid it.

"Explicit is better than implicit." but the more important is that both solutions are correct.

fgallaire · 2018-02-14T23:49:24Z

Don't see my author credit

fgallaire · 2018-02-15T00:05:29Z

And you miss the if self.width <= 0: bug fixed in #89

Co-authored-by: Florent Gallaire <[email protected]>

JulienPalard · 2018-03-06T22:59:05Z

And you miss the if self.width <= 0: bug fixed in #89

You're right! And trying to split a wide character yield to an infinite loop.

Don't see my author credit

Gladly fixed and co-authored you.

fgallaire · 2018-03-06T23:07:30Z

Thanks @JulienPalard, I'm so happy ! I had almost lost hope to see this issue fixed.

fgallaire · 2018-03-06T23:21:04Z

Lib/textwrap.py

+        if self.width <= 0:
+            raise ValueError("invalid width %r (must be > 0)" % self.width)
+        elif self.width == 1 and _width(text) > len(text):
+            raise ValueError("invalid width 1 (must be > 1 when CJK chars)")


I have done a more complex solution:

elif self.width == 1 and (sum(self._width(chunk) for chunk in chunks) > sum(len(chunk) for chunk in chunks)):

It throws the exception earlier, but it's probably not absolutely necessary.

terryjreedy

The change I request is that this be closed because it is conceptually wrong. Textwrap works in terms of abstract 'characters' (codepoint), not physical units. I will explain this on the issue.

Aside from that, 2 is the wrong number to add, as 'double-width' characters are not actually twice as wide as fixed-pitch Ascii chars of the same height. See the issue for this as well.

bedevere-bot · 2018-07-08T20:08:04Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

the-knights-who-say-ni added the CLA signed label Feb 13, 2018

bedevere-bot added the awaiting merge label Feb 13, 2018

ned-deily requested a review from larryhastings February 13, 2018 03:19

fgallaire reviewed Feb 14, 2018

View reviewed changes

JulienPalard force-pushed the textwrap-cjk branch 2 times, most recently from 45fd84d to 4623375 Compare March 6, 2018 22:42

bpo-24665: Add CJK support in textwrap by default.

57b2882

Co-authored-by: Florent Gallaire <[email protected]>

JulienPalard force-pushed the textwrap-cjk branch from 4623375 to 57b2882 Compare March 6, 2018 22:43

fgallaire reviewed Mar 6, 2018

View reviewed changes

JulienPalard requested a review from vstinner March 28, 2018 21:16

terryjreedy requested changes Jul 8, 2018

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting merge labels Jul 8, 2018

methane closed this Jul 11, 2018

JulienPalard deleted the textwrap-cjk branch June 16, 2019 14:07

Uh oh!

bpo-24665: Add CJK support in textwrap by default. #5649

bpo-24665: Add CJK support in textwrap by default. #5649

Uh oh!

Conversation

JulienPalard commented Feb 13, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fgallaire Feb 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fgallaire commented Feb 14, 2018

Uh oh!

fgallaire commented Feb 15, 2018

Uh oh!

JulienPalard commented Mar 6, 2018

Uh oh!

fgallaire commented Mar 6, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

terryjreedy left a comment

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Jul 8, 2018

Uh oh!

Uh oh!

JulienPalard commented Feb 13, 2018 •

edited by bedevere-bot

Loading

fgallaire Feb 15, 2018 •

edited

Loading