-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
urlparse goes wrong with IP:port without scheme #38644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
urlparse doesnt work if IP and port are given without >>> urlparse.urlparse('1.2.3.4:80','http')
('1.2.3.4', '', '80', '', '', '') should be: >>> urlparse.urlparse('1.2.3.4:80','http')
('http', '1.2.3.4', '80', '', '', '') |
Logged In: YES urlparse.urlparse takes a url of the format: And returns a 6-tuple of the format: An example from the library refrence takes: |
Logged In: YES Sorry, previous comment got cut off... urlparse.urlparse takes a url of the format: And returns a 6-tuple of the format: An example from the library refrence takes: And produces: -------------------------------- Note that there isn't a field for the port number in the >>> urlparse.urlparse('1.2.3.4:80','http')
('http', '1.2.3.4:80', '', '', '', '') Instead, it gives the incorrect output as you indicated. |
Logged In: YES Ok, I researched this a bit, and the situation isn't as It seems to me that the source code follows rfc1808 >>> urlparse.urlparse('python.org')
('', '', 'python.org', '', '', '')
>>> urlparse.urlparse('python.org', 'http')
('http', '', 'python.org', '', '', '') Note that it is putting 'python.org' as the path and not the >>> urlparse.urlparse('//python.org')
('', 'python.org', '', '', '', '')
>>> urlparse.urlparse('//python.org', 'http')
('http', 'python.org', '', '', '', '') So here it does the correct thing. There are two problems though. First, it is common for So somebody needs to make a decision. Should urlparse follow In any case, you can temporarily solve your problem by >>> urlparse.urlparse('//1.2.3.4:80', 'http')
('http', '1.2.3.4:80', '', '', '', '') |
Logged In: YES The problem is still present in Py2.3.4. IMO, it should support dirs without the "http://" or raise |
Logged In: YES Will look into it. Should be easy to fix. |
Attaching the patch to fix this issue. I deliberated upon this for a
If we go for any other fix, like internally pre-pending // when user has Let me know your thoughts on this. >>> urlparse('1.2.3.4:80')
ParseResult(scheme='', netloc='', path='1.2.3.4:80', params='',
query='', fragment='')
>>> urlparse('http://www.python.org:80/~guido/foo?query#fun')
ParseResult(scheme='http', netloc='www.python.org:80',
path='/~guido/foo', params='', query='query', fragment='fun')
>>> |
Senthil, your patch is wrong, see: >>> import urlparse
>>> urlparse.urlparse('1.2.3.4:80','http')
ParseResult(scheme='http', netloc='', path='1.2.3.4:80', params='',
query='', fragment='') The netloc should be "1.2.3.4:80", note the composition of an URL: <scheme>://<netloc>/<path>;<params>?<query>#<fragment> Please fix it and test it applying the patch to the test I'm submitting |
I agree with facundobatista that the patch is bad, but for a different >>> import urlparse
>>> urlparse.urlparse ('http:')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/anthony/svn/python26/Lib/urlparse.py", line 108, in urlparse
tuple = urlsplit(url, scheme, allow_fragments)
File "/home/anthony/svn/python26/Lib/urlparse.py", line 148, in urlsplit
if i > 0 and not url[i+1].isdigit():
IndexError: string index out of range I'm afraid that it it's not evident that the expected behavior isn't Take for example: >>> import urlparse
>>> urlparse.urlparse('some.com', 'http')
ParseResult(scheme='http', netloc='', path='some.com', params='',
query='', fragment='') Is the url referring to the some.com domain or to a windows executable file? If you're using urlparse to parse only absolute urls then probably you It would probably be better to be explicit and raise an exception if the |
I agree with Anthony here, because if you let people write without the So, the better behaviour to be as explicit as possible should be: >>> urlparse.urlparse('1.2.3.4:80','http')
Traceback!!! ValueError(<nice message here>)
>>> urlparse.urlparse('//1.2.3.4:80','http')
('http', '1.2.3.4:80', '', '', '', '') So, to close this issue, we should fix the code to behave like indicated What do you think? |
I am attaching the modified patch, which addresses the port issue Facundo, I gave sufficient thought on raising an Exception for URLS not As urlparse module is used for handling both absolute URLs as well as The way to inform the users to use '//net_loc' when they want net_loc, This case may seem absurd when 'www.python.org' is treated as path but Another way to handle this would be split urlparse into two methods: Irrespective of this, if the patch looks okay for "handling the port Comments Please. |
I think this last patch is ok, but the third case that was raised in the """ Please, address this new detail, and I'd commit this. Thanks! |
Facundo, I re-looked at this issue (after a long time; sorry for that) The Web-SIG discussion, which you pointed to in the comment The suggestion is specifically for "//path" kind of urls, which is will My suggestion is to fix this issue with patch and if the corner case Your comments? |
The patch will need to be reworked for the 2.7, 3.1 and 3.2 branches. |
I've reworked the patch so that it applied against the py3k branch. It's been attached to this issue and is also available here: http://codereview.appspot.com/1910044. |
Fixed in revision 83700 (release27-maint). r83701(py3k) and r83702(release31-maint). David, thanks for reworking on the patch. Couple of comments
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: