Skip to content

gh-131878: Fix input of unicode characters with two or more code points in new pyrepl on Windows #131901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sergey-miryanov
Copy link
Contributor

@sergey-miryanov sergey-miryanov commented Mar 30, 2025

If unicode characters with two or more codepoints (ñ or é) typed or pasted, then it should be converted to bytes before passing to eventqueue.push.

Also fixed handling of exceptions while decoding buffer. If exception occurred then buffer should be flushed to prevent mixing it with following control commands (for example "\x1b[201").

- with two codepoints or more
@python-cla-bot
Copy link

python-cla-bot bot commented Apr 6, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@chris-eibl
Copy link
Member

LGTM. I cannot do a thorough review, but at least confirm, that with this PR I can enter äÄöÖüÜ, etc, again 🚀

Copy link
Member

@tomasr8 tomasr8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with a Czech keyboard and everything works. Thanks!

@chris-eibl
Copy link
Member

LGTM. I cannot do a thorough review, but at least confirm, that with this PR I can enter äÄöÖüÜ, etc, again 🚀

Had some time now: this needs fixes for Linux. Otherwise looks good, thanks @sergey-miryanov!

@tomasr8
Copy link
Member

tomasr8 commented Apr 27, 2025

This breaks Linux, because only one char is inserted in UnixConsole.get_event() [...]

Since this wasn't caught by the tests, we should also probably add a test for that.

@chris-eibl
Copy link
Member

chris-eibl commented Apr 27, 2025

I have suggested the def test_push_unicode_character_two_bytes(self): which tests exactly that.

@sergey-miryanov
Copy link
Contributor Author

@chris-eibl Please take a look! I would like to keep tests for paste mode to be sure that escape sequences not mixed with input if error occurred. But I'm OK to remove them if you think it is not necessary.

@chris-eibl
Copy link
Member

LGTM. Sure, more tests are always better. Though, AFAIU, there is nothing special about paste mode here:

  • test_push_unicode_character_two_bytes_in_paste_mode just asserts that a valid "two byte char" within a sequence of "one byte chars" is working as expected.
  • likewise, test_push_unicode_character_as_str_in_paste_mode asserts that when an exception during a sequence of feeding bytes via push is caught, we can continue pushing afterwards.

IMHO, both are reasonable, but "paste mode" makes them look too special about pasting?

So I'd rename them and use arbitrary "one byte chars" in there.

@sergey-miryanov
Copy link
Contributor Author

Sequence of "one byte chars" is a control command to enable "paste mode". Originally in gh-131878 problem aroise from case where the wrongly passed unicode string puts to the buffer and mixed with following control command that disables paste mode and asserted in those code path. So, I want to check that we don't "corrupt" chars buffer.

@chris-eibl
Copy link
Member

Yupp, I know. But as said, there is nothing special about bracketed pasting here. From the perspective of the eventqueue test, these are just arbitrary bytes. Whether bracketed pasting is working correctly, is tested in

def test_bracketed_paste(self):

If you'd like to showcase that sequences of "one byte chars" interleaved with "multi byte chars" work correctly via "paste mode", then maybe just add a comment?

And small nit: ~ is missing in the escape sequences, see

paste_start = "\x1b[200~"
paste_end = "\x1b[201~"

To me, those two tests have merit, but are too special about "paste mode".

@sergey-miryanov
Copy link
Contributor Author

@chris-eibl It seems that I did not correctly assume how paste-mode works (shame on me!). I removed one test because the same code path is covered by test_push_unicode_character_two_bytes. And renamed and simplified a bit test_push_single_chars_and_unicode_character_as_str (not sure the name is ok though).

@chris-eibl
Copy link
Member

I don't have a better name - the comment in there clarifies what we intend to test (not much tbh).
Just a small nit on the comment.

Otherwise LGTM. Thanks for bearing with me!

@sergey-miryanov
Copy link
Contributor Author

@chris-eibl Thanks for the review and patience!

Copy link
Member

@chris-eibl chris-eibl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sergey-miryanov!

Copy link
Member

@tomasr8 tomasr8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested again after the simplifications. WFM on both Windows and Linux :)

Co-authored-by: Tomas R. <[email protected]>
Copy link
Contributor

@ambv ambv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gentlemen, excellent work. Not only does it fix the bug, but it makes the codebase simpler and more elegant. If not for the new test, the net number of lines added here would be negative. Impressive!

@ambv ambv merged commit 0c5151b into python:main May 5, 2025
52 checks passed
@ambv ambv added the needs backport to 3.13 bugs and security fixes label May 5, 2025
@miss-islington-app
Copy link

Thanks @sergey-miryanov for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @sergey-miryanov and @ambv, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 0c5151bc81ec8e8588bef4389df12a9ab50e9fa0 3.13

@sergey-miryanov
Copy link
Contributor Author

@ambv Please take a look - this is the same problem with backport as in earlier PR - #130805 (comment)

ambv pushed a commit to ambv/cpython that referenced this pull request May 5, 2025
…ore code points in new pyrepl on Windows (pythongh-131901)

(cherry picked from commit 0c5151b)

Co-authored-by: Sergey Miryanov <[email protected]>
Co-authored-by: Tomas R. <[email protected]>
Co-authored-by: Chris Eibl <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented May 5, 2025

GH-133468 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label May 5, 2025
ambv added a commit that referenced this pull request May 5, 2025
…de points in new pyrepl on Windows (gh-131901) (gh-133468)

(cherry picked from commit 0c5151b)

Co-authored-by: Sergey Miryanov <[email protected]>
Co-authored-by: Tomas R. <[email protected]>
Co-authored-by: Chris Eibl <[email protected]>
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 Debian root 3.13 (tier-1) has failed when building commit 891232f.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1441/builds/1169) and take a look at the build logs.
  4. Check if the failure is related to this commit (891232f) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1441/builds/1169

Summary of the results of the build (if available):

Click to see traceback logs
remote: Enumerating objects: 26, done.        
remote: Counting objects:   3% (1/26)        
remote: Counting objects:   7% (2/26)        
remote: Counting objects:  11% (3/26)        
remote: Counting objects:  15% (4/26)        
remote: Counting objects:  19% (5/26)        
remote: Counting objects:  23% (6/26)        
remote: Counting objects:  26% (7/26)        
remote: Counting objects:  30% (8/26)        
remote: Counting objects:  34% (9/26)        
remote: Counting objects:  38% (10/26)        
remote: Counting objects:  42% (11/26)        
remote: Counting objects:  46% (12/26)        
remote: Counting objects:  50% (13/26)        
remote: Counting objects:  53% (14/26)        
remote: Counting objects:  57% (15/26)        
remote: Counting objects:  61% (16/26)        
remote: Counting objects:  65% (17/26)        
remote: Counting objects:  69% (18/26)        
remote: Counting objects:  73% (19/26)        
remote: Counting objects:  76% (20/26)        
remote: Counting objects:  80% (21/26)        
remote: Counting objects:  84% (22/26)        
remote: Counting objects:  88% (23/26)        
remote: Counting objects:  92% (24/26)        
remote: Counting objects:  96% (25/26)        
remote: Counting objects: 100% (26/26)        
remote: Counting objects: 100% (26/26), done.        
remote: Compressing objects:  33% (1/3)        
remote: Compressing objects:  66% (2/3)        
remote: Compressing objects: 100% (3/3)        
remote: Compressing objects: 100% (3/3), done.        
remote: Total 14 (delta 12), reused 12 (delta 11), pack-reused 0 (from 0)        
From https://github.com/python/cpython
 * branch                    3.13       -> FETCH_HEAD
Note: switching to '891232f3386dd8b20a216a473954c1b01cede7ec'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 891232f3386 [3.13] gh-131878: Fix input of unicode characters with two or more code points in new pyrepl on Windows (gh-131901) (gh-133468)
Switched to and reset branch '3.13'

configure: WARNING: no system libmpdecimal found; falling back to bundled libmpdecimal (deprecated and scheduled for removal in Python 3.15)
configure: WARNING: pkg-config is missing. Some dependencies may not be detected correctly.

find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
make: [Makefile:3116: clean-retain-profile] Error 1 (ignored)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants