-
Notifications
You must be signed in to change notification settings - Fork 522
Handle requests using absolute-form request URIs #1246
Conversation
{ | ||
// Request URIs can be in absolute form | ||
// See https://tools.ietf.org/html/rfc7230#section-5.3.2 | ||
// This will skip over scheme and authority for determinie path, but preserving the vlues for rawTarget. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: determinie, vlues
// an absolute URI is not required to end in a slash but must not be empty | ||
// see https://tools.ietf.org/html/rfc3986#section-4.3 | ||
|
||
var pathIndex = scan.Seek(ByteForwardSlash, ByteSpace, ref end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seek
returns -1 or the first byte it found with one of the specified values. The return value is not an index.
|
||
consumed = scan; | ||
Method = method; | ||
QueryString = queryString; | ||
RawTarget = rawTarget; | ||
RawTarget = requestUriScheme + requestAuthority + rawUrlPath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll see a perf hit from these concatenations. Ideally you should get the raw target in one go and set it here, instead of building it from it's parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, requestAuthority
might be null, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requestUriScheme and requestAuthority will be null for most requests. Isn't concat with null a noop? Since the perf lab is down right now, I haven't been able to test if this incurs a significant perf hit or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, TIL. I thought you'd get a NullReferenceException
when trying to concat a null
.
}); | ||
}); | ||
|
||
using (var host = builder.Build()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TestServer
{ | ||
host.Start(); | ||
|
||
using (var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TestConnection
|
||
namespace Microsoft.AspNetCore.Server.Kestrel.FunctionalTests | ||
{ | ||
public static class TaskExtensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[InlineData("http://localhost", true, "")] | ||
[InlineData("http://", false, null)] | ||
[InlineData("https://", false, null)] | ||
public async Task CanHandleRequestsWithUrlInAbsoluteForm(string requestUrl, bool valid, string expectedPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test cases for bad requests with malformed absolute URLs.
/// <param name="knownScheme">A reference to the known scheme, if the input matches any</param> | ||
/// <returns></returns> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static bool GetKnownUriScheme(this MemoryPoolIterator begin, out string knownScheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit tests (look at existing tests for the other GetKnown
* methods).
string requestAuthority = null; | ||
if (scan.GetKnownUriScheme(out requestUriScheme)) | ||
{ | ||
// Request URIs can be in absolute form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example
Log.IsEnabled(LogLevel.Information) ? start.GetAsciiStringEscaped(end, MaxInvalidRequestLineChars) : string.Empty); | ||
} | ||
|
||
// TODO consider handling unicode host names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF-8
🆙 📅 |
var chNext = scan.Peek(); | ||
if (chNext == ByteForwardSlash || chNext == ByteSpace) | ||
{ | ||
// an absolute URI is not required to end in a slash but host must not be empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this comment outside the if
@@ -428,59 +429,44 @@ public void CanUpgradeRequestWithConnectionKeepAliveUpgradeHeader() | |||
[InlineData("http://localhost/", true, "/")] | |||
[InlineData("https://localhost/", true, "/")] | |||
[InlineData("http://localhost", true, "")] | |||
// bad requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write a separate test method for bad requests. Also, you can add those to the BadHttpRequestTests
class.
"Connection: close\r\n" + | ||
"\r\n"; | ||
await connection.SendAll( | ||
$"GET {requestUrl} HTTP/1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent
This implementation ignores the authority section altogether, which means some malformed URI's could still be accepted. Is this okay? e.g. Or do should we do a safe but slow System.Uri check to make sure the full URI is valid? |
begin = scan; | ||
|
||
var targetIsAbsolute = false; | ||
string requestUriScheme; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not sure why you're creating these strings only to rebuild the rawTarget later. why not capture the full rawTarget all at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
return false; | ||
} | ||
|
||
if ((value & _mask7Chars) == _httpSchemeLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the spec allow for ws:// or wss:// (WebSockets)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec is ambiguous. It only says absolute-form = absolute-URI
. Doesn't say anything about needing to ensure the scheme is http
or https
. What does websocket clients normally send?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 6455 section 4.1 states that websockets clients are required to send the initial HTTP request using 'origin-form' (normal request format), not 'absolute-form'. I don't think we have a valid client scenario for a request like GET ws://host/path HTTP/1.1
so we're probably okay rejecting this format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it's not a valid client scenario to send absolute-form targets to the end-server, right? It's only meant to be sent to proxies AFAIK. I recall someone mentioning the spec requires that proxies change the target to origin-form, but end-servers still support absolute-form in case the proxy isn't compliant with the spec.
If we already going through this work to support buggy proxies, I think we should also allow ws/wss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make a distinction here, we support accepting absolute-form b/c the spec requires it and some http clients may send it. I'm in favor of rejecting anything with 'ws://' or 'wss:// because that is a buggy WebSockets client.
So long as we're not acting on the unvalidated values it should be fine. |
[InlineData("GET http:// HTTP/1.1\r\n")] | ||
[InlineData("GET https:// HTTP/1.1\r\n")] | ||
[InlineData("GET http:/// HTTP/1.1\r\n")] | ||
[InlineData("GET http://// HTTP/1.1\r\n")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add:
- Not starting with
http://
orhttps://
http
https
http:
https:/
http:/
https:/
http://<space>
https://<space>
http:// /
https:// /
[InlineData("HtTp:// ", false, null)] | ||
[InlineData("https://", true, MemoryPoolIteratorExtensions.HttpsScheme)] | ||
[InlineData("https://abc", true, MemoryPoolIteratorExtensions.HttpsScheme)] | ||
public void GetKnownHttpSchema(string input, bool expectedResult, string expectedString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Gets
[InlineData("https://differenthost/abs/path", "/abs/path")] // handles mismatched hostname | ||
[InlineData("http://localhost/", "/")] | ||
[InlineData("https://localhost/", "/")] | ||
[InlineData("http://localhost", "")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide things like http://:80
are ok, you should add it here.
@@ -52,12 +52,26 @@ public class BadHttpRequestTests | |||
[InlineData("} / HTTP/1.0\r\n")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test somewhere for OPTIONS * HTTP/1.1
? Path should be empty and RawTarget should be *.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: KestrelTests.RequestTargetProcessingTests.NonPathRequestTargetSetInRawTarget
.
if (targetBegin.Peek() != ByteForwardSlash) | ||
{ | ||
string requestUriScheme; | ||
if (scan.GetKnownHttpSchema(out requestUriScheme)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given what we discussed, I don't think you need this method anymore - we're not really looking for any particular schemas, since we don't do anything about them. We do want to make sure it's a valid URI, otherwise we could just accept any kind of <garbage>/a/b/c
, and that wouldn't make sense.
So here you could check if there isn't a /
, like you're doing, and if there isn't this is the slow path and you grab the string from targetBegin
to the slash and check if that's a valid URI (like you did below). Otherwise you're in the hot path and you should do things as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetKnownHttpSchema
skips an allocation for the Scheme string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as Ben said, one goal was to minimize allocations.
Also, AFAIK there are no valid scenarios using absolute-form and schemes other than http://
andhttps://
. Kestrel should reject requests like GET otherscheme://localhost/ HTTP/1.1
as invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheme string wouldn't need to be allocated even if we supported unknown schema. The out parameter could be the length of the schema string since the schema from the target isn't stored anywhere.
Better yet, this could be like GetKnownVersion
and advance the iterator itself. I see that GetKnownMethod
doesn't advance the iterator. I would change that too for consistency.
@@ -1021,8 +1025,34 @@ public RequestLineStatus TakeStartLine(SocketInput input) | |||
scan.Skip(method.Length); | |||
} | |||
|
|||
scan.Take(); | |||
scan.Take(); // consume space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Put comment above line.
if (scan.Seek(ByteForwardSlash, ByteSpace, ref end) == -1) | ||
{ | ||
RejectRequest(RequestRejectionReason.InvalidRequestLine, | ||
Log.IsEnabled(LogLevel.Information) ? start.GetAsciiStringEscaped(end, MaxInvalidRequestLineChars) : string.Empty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Make a helper method for RejectRequest(RequestRejectionReason.InvalidRequestLine, startLineString)
that takes the start and end iterators instead of a string so we don't see the ternary expression all over this method.
@@ -1058,13 +1088,15 @@ public RequestLineStatus TakeStartLine(SocketInput input) | |||
|
|||
var queryEnd = scan; | |||
|
|||
if (pathBegin.Peek() == ByteSpace) | |||
if (pathBegin.Peek() == ByteSpace && targetBegin.Index == pathBegin.Index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's the value of Path
if the start line is GET http://www.example.org HTTP/1.1
? string.Empty
?
This seems weird since GET HTTP/1.1
is invalid. I think we should set Path to /
or reject the request. Not sure which.
scan.Take(); | ||
scan.Take(); // consume space | ||
|
||
// begin consuming HTTP-version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new comments. Same nit about them being on their own line.
@@ -1103,7 +1135,7 @@ public RequestLineStatus TakeStartLine(SocketInput input) | |||
if (needDecode) | |||
{ | |||
// Read raw target before mutating memory. | |||
rawTarget = pathBegin.GetAsciiString(ref queryEnd); | |||
rawTarget = targetBegin.GetAsciiString(ref queryEnd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there's a %
in the host portion of the absolute target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC there RawTarget has always been the ASCII representation. Since we ignore the authority portion of the absolute-URI, we never bother to look for a %
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% doesn't really apply to hosts anyways, they prefer punycode.
} | ||
else if (rawTarget[0] == '/') // check rawTarget since normalizedTarget can be "" or "/" after dot segment removal | ||
else if (requestUrlPath?.Length > 0 && requestUrlPath[0] == '/') // check requestUrlPath since normalizedUrlPath can be "" or "/" after dot segment removal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requestUrlPath
can't be null, right? Like I mentioned in another comment, I don't like allowing requestUrlPath
to ever be empty either.
I hadn't considered dot segment removal before though. I think if the dot segment removal ever causes the path to go past the root, e.g. /../
, I would rather reject the request rather than set Path
and QueryString
to string.Empty
.
return false; | ||
} | ||
|
||
if ((value & _mask7Chars) == _httpSchemeLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it's not a valid client scenario to send absolute-form targets to the end-server, right? It's only meant to be sent to proxies AFAIK. I recall someone mentioning the spec requires that proxies change the target to origin-form, but end-servers still support absolute-form in case the proxy isn't compliant with the spec.
If we already going through this work to support buggy proxies, I think we should also allow ws/wss.
[InlineData("http://[email protected]/path", "/path")] | ||
[InlineData("http://root:[email protected]/path", "/path")] | ||
[InlineData("https://localhost/", "/")] | ||
[InlineData("http://localhost", "")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that you added a test for this. Don't like the result 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, path should not be empty. reject or set it to /
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm favor of keeping it and setting to "/". The HTTP spec says is absolute-form = absolute-URI
, and absolute URI's don't require the trailing slash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
if (targetBegin.Peek() != ByteForwardSlash) | ||
{ | ||
string requestUriScheme; | ||
if (scan.GetKnownHttpSchema(out requestUriScheme)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheme string wouldn't need to be allocated even if we supported unknown schema. The out parameter could be the length of the schema string since the schema from the target isn't stored anywhere.
Better yet, this could be like GetKnownVersion
and advance the iterator itself. I see that GetKnownMethod
doesn't advance the iterator. I would change that too for consistency.
} | ||
|
||
begin = scan; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
reject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we don't want to reject yet. e.g. 'asterisk-form'. is still okay.
aaf80e7
to
68b914c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits and some comments on tests. After that and TODOs we're good to go.
// Start-line can contain uri in absolute form. e.g. 'GET http://contoso.com/favicon.ico HTTP/1.1' | ||
// Clients should only send this to proxies, but the spec requires we handle it anyways. | ||
// cref https://tools.ietf.org/html/rfc7230#section-5.3 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove this blank line
// validation of absolute-form URI may be slow, but clients | ||
// should not be sending this form anyways, so perf optimization | ||
// not high priority | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove this blank line
[InlineData("http://localhost", "/")] | ||
[InlineData("http://127.0.0.1/", "/")] | ||
[InlineData("http://[::1]/", "/")] | ||
[InlineData("http://[::1]:8080/", "/")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add cases with:
- Query string
- Percent encoding
- Both
{ | ||
using (var connection = server.CreateConnection()) | ||
{ | ||
await connection.SendAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Send
"Host: localhost", | ||
"", | ||
""); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connection.Receive
expected response
[InlineData("GET / / HTTP/1.1\r\n")] | ||
[InlineData("GET http:// HTTP/1.1\r\n")] | ||
[InlineData("GET https:// HTTP/1.1\r\n")] | ||
[InlineData("GET https:// / HTTP/1.1\r\n")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these here? The version is valid in the new test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are here because there is a space or a slash where the HTTP version should begin.
[InlineData("http://[::1]", "/")] | ||
[InlineData("http://[::1]/path", "/path")] | ||
[InlineData("http://[::1]:8080/", "/")] | ||
[InlineData("http://user@[::1]:8080/", "/")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add cases with query string and percent encoding here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you miss this one?
[InlineData("http://abc", true, MemoryPoolIteratorExtensions.HttpScheme)] | ||
[InlineData("http:// ", true, MemoryPoolIteratorExtensions.HttpScheme)] | ||
[InlineData("http:/ ", false, null)] | ||
[InlineData("HtTp:// ", false, null)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add wrong casing tests for https too.
[InlineData("http:/ ", false, null)] | ||
[InlineData("HtTp:// ", false, null)] | ||
[InlineData("https://", true, MemoryPoolIteratorExtensions.HttpsScheme)] | ||
[InlineData("https://abc", true, MemoryPoolIteratorExtensions.HttpsScheme)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test same conditions on http and https. You're not testing https:/ nor followed by space.
For both: http:
and https:
🆙 📅 |
if ((value & _mask7Chars) == _httpSchemeLong) | ||
{ | ||
knownScheme = HttpScheme; | ||
begin.Skip(knownScheme.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this doing anything? Isn't begin
a copied struct that isn't used at all afterwards?
/// <param name="knownScheme">A reference to the known scheme, if the input matches any</param> | ||
/// <returns>True when memory starts with known http or https schema</returns> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static bool GetKnownHttpSchema(this MemoryPoolIterator begin, out string knownScheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the out parameter ever used? Would HasKnownHttpSchema be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the out param is used.
// TODO invalidate requests that don't meet the HTTP spec. See https://github.com/aspnet/KestrelHttpServer/issues/1279 | ||
//[InlineData("GET otherscheme://host/ HTTP/1.1\r\n")] | ||
//[InlineData("GET ws://host/ HTTP/1.1\r\n")] | ||
//[InlineData("GET wss://host/ HTTP/1.1\r\n")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an absolute-form request with a non HTTP(S) scheme, does Path become string.Empty since it doesn't start with '/' while not being rejected?
Currently, yes. That is the behavior, and was I opened #1279 to address.
|
Yeah this can be removed. This method can be HasKnownHttpMethod instead, but we still need an out param to get the length that should be skipped. Will update the PR
|
An absolute-form request URI has a start line in form: "GET http://host/path HTTP/1.1". RFC 7230 section 5.3.2 stipulates that servers should allow absolute-form request URIs. This change will handles requests using absolute-form. The scheme and authority section of the absolute URI are ignored, but will still appear in IHttpRequestFeature.RawTarget. Resolves #666
f84d50d
to
5b1d778
Compare
@natemcmaster Can't you skip inside the method itself? |
cc @pakrym this probably overlaps with work you are doing. Ok if I put this in before? Or would it be better to hold off? |
An absolute-form request URI has a start line in form: "GET http://host/path HTTP/1.1".
RFC 7230 section 5.3.2 stipulates that servers should allow absolute-form request URIs.
This change will handles requests using absolute-form. The scheme and authority
section of the absolute URI are ignored, but will still appear in
IHttpRequestFeature.RawTarget
.Resolves #666
cc @CesarBS @Tratcher