-
Notifications
You must be signed in to change notification settings - Fork 107
safe_url_string: escape additional characters #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #203 +/- ##
==========================================
+ Coverage 95.98% 96.14% +0.16%
==========================================
Files 6 8 +2
Lines 473 493 +20
Branches 90 92 +2
==========================================
+ Hits 454 474 +20
Misses 9 9
Partials 10 10
|
Hey @Gallaecio!
Could you please confirm that my understanding is correct?
|
Co-authored-by: Mikhail Korobov <[email protected]>
Correct. Same when the URL is written by the user directly into the browser address bar. And even when the address bar shows the URL with e.g. non-ASCII characters, in the browser developer tools, on the network tab, you can see the actual URL sent, which includes additional percent-encoding.
The goal of safe_url_string is indeed the one you describe. But I would argue that it is not exactly the goal of the URL living standard. The URL living standard does not seem to try to support servers that follow older standards, and instead seems to expect servers to adapt. Servers that require some characters escaped, even though they do not require escaping according to the standard, must, for example, make sure that all instances of URLs with those characters in their website are escaped, so that users follow the escaped URLs. So, if your server does not support unescaped pipes in URL paths, you can still have URLs with pipes, but they must appear encoded so that, when users follow them, browsers send the pipe percent-encoded. And because the URL standard expect encoded characters to remain encoded, even if they are safe characters according to the standard, things should work. For example, if you enter Chrome seems to try to play it safer, and always encode So safe_url_string diverges from the URL living standard by building URLs that are safe by additional standards, apparently in line with Chrome. URLs where we percent-encode additional characters (to make URLs safe by the definition of additional, older standards) are still compatible with the URL living standard, since its URL parsing does not percent-decode what is already percent-encoded, even if it is a safe character. (our current implementation does percent-decode safe characters, but changing that is not trivial, and should be handled separately, e.g. through #203).
I would say we aim to have a behavior 100% compatible with the URL living standard, but our aim with As servers move on and stop relying on older standards, we can move on and remove support for those, getting closer to the URL living standard serialization rules. However:
Yes and no. The characters that we escape in userinfo bring us closer to the URL living standard. The escaping of single quotes depending on the URL schema does the same. However, the extra characters that we escape on path, query and fragment do not require escaping in the URL living standard, we escape them to align with restrictions from older standards that we know are still used by servers. Incidentally, it also brings us closer to Chrome, which seems to have a similar aim to |
Thanks @Gallaecio, I learned a lot about URLs today! |
Changes:
Make
safe_url_string
percent-encode any character that is not considered safe on any of the URL standards we know to be in use by modern servers:As a result,
:;=
are now percent-encoded in userinfo,|[]
in paths, queries and fragments, and, following the URL living standard,'
is also percent-encoded in the query depending on the URL scheme.The only exception is %, which we should probably encode as
%25
when not followed by 2 hexadecimal digits, but doing so would require major changes to the currentsafe_url_string
implementation that are out of the scope of this change.Add extra tests for
safe_url_string
from Make safe_url_string safer #201, which highlight pending issues in the current implementation, to be addressed separately.Fixes #80.