Skip to content

gh-48241: Clarify URL needs to be encoded when provided to urlopen and Request #103855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 26, 2023

Conversation

mblahay
Copy link
Contributor

@mblahay mblahay commented Apr 25, 2023

Adding note about url needing to be encoded when provided to the urlopen function as well as the Request class.

This is a documentation change

@ghost
Copy link

ghost commented Apr 25, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@mblahay
Copy link
Contributor Author

mblahay commented Apr 25, 2023

@ambv This is ready

Copy link
Contributor

@ambv ambv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can drop the comma between "encoded" and "URL"?

@arhadthedev arhadthedev added docs Documentation in the Doc dir awaiting review labels Apr 26, 2023
@ambv ambv added the needs backport to 3.11 only security fixes label Apr 26, 2023
@ambv ambv merged commit 44010d0 into python:main Apr 26, 2023
@miss-islington
Copy link
Contributor

Thanks @mblahay for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 26, 2023
…pen and Request (pythonGH-103855)

(cherry picked from commit 44010d0)

Co-authored-by: Michael Blahay <[email protected]>
Co-authored-by: Łukasz Langa <[email protected]>
@bedevere-bot
Copy link

GH-103891 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Apr 26, 2023
@ambv ambv changed the title gh-48241: Adding note about url needing to be encoded when provided to the urlopen and Request gh-48241: Clarify URL needs to be encoded when provided to urlopen and Request Apr 26, 2023
erlend-aasland pushed a commit that referenced this pull request May 9, 2023
…open and Request (GH-103855) (#103891)

(cherry picked from commit 44010d0)

Co-authored-by: Michael Blahay <[email protected]>
Co-authored-by: Łukasz Langa <[email protected]>
@vadimkantorov
Copy link

vadimkantorov commented Jan 23, 2024

A bit hard to understand what means Open url, which can be either a string containing a valid, properly encoded UR and the notions valid, properly encoded. If it accepts a string, there probably need to be some RFC reference or examples of what's properly encoded and how to encode sth properly.

My problem: is urllib.request.urlopen('https://google.com/?q=…') fails with encoding error as does urllib.request.urlopen(urllib.parse.quote('https://google.com/?q=…',safe=':/')) fails with not found error as quote also corrupts the question mark: https://google.com/%3Fq%3D%E2%80%A6'

I've tried

urlparsed = urllib.parse.urlparse(url)
urlparsed_unicode_sanitized_query = urllib.parse.ParseResult(urlparsed.scheme, urlparsed.netloc, urlparsed.path, urlparsed.params, urllib.parse.quote(urlparsed.query), urlparsed.fragment)
urlopen_url = urllib.parse.urlunparse(urlparsed_unicode_sanitized_query)

but this also fails if the query string is already url-encoded :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants