Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Handle requests using absolute-form request URIs #1246

Closed
wants to merge 2 commits into from
Closed

Conversation

natemcmaster
Copy link
Contributor

An absolute-form request URI has a start line in form: "GET http://host/path HTTP/1.1".
RFC 7230 section 5.3.2 stipulates that servers should allow absolute-form request URIs.

This change will handles requests using absolute-form. The scheme and authority
section of the absolute URI are ignored, but will still appear in IHttpRequestFeature.RawTarget.

Resolves #666

cc @CesarBS @Tratcher

{
// Request URIs can be in absolute form
// See https://tools.ietf.org/html/rfc7230#section-5.3.2
// This will skip over scheme and authority for determinie path, but preserving the vlues for rawTarget.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: determinie, vlues

// an absolute URI is not required to end in a slash but must not be empty
// see https://tools.ietf.org/html/rfc3986#section-4.3

var pathIndex = scan.Seek(ByteForwardSlash, ByteSpace, ref end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seek returns -1 or the first byte it found with one of the specified values. The return value is not an index.


consumed = scan;
Method = method;
QueryString = queryString;
RawTarget = rawTarget;
RawTarget = requestUriScheme + requestAuthority + rawUrlPath;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll see a perf hit from these concatenations. Ideally you should get the raw target in one go and set it here, instead of building it from it's parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, requestAuthority might be null, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestUriScheme and requestAuthority will be null for most requests. Isn't concat with null a noop? Since the perf lab is down right now, I haven't been able to test if this incurs a significant perf hit or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, TIL. I thought you'd get a NullReferenceException when trying to concat a null.

});
});

using (var host = builder.Build())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TestServer

{
host.Start();

using (var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TestConnection


namespace Microsoft.AspNetCore.Server.Kestrel.FunctionalTests
{
public static class TaskExtensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[InlineData("http://localhost", true, "")]
[InlineData("http://", false, null)]
[InlineData("https://", false, null)]
public async Task CanHandleRequestsWithUrlInAbsoluteForm(string requestUrl, bool valid, string expectedPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test cases for bad requests with malformed absolute URLs.

/// <param name="knownScheme">A reference to the known scheme, if the input matches any</param>
/// <returns></returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool GetKnownUriScheme(this MemoryPoolIterator begin, out string knownScheme)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests (look at existing tests for the other GetKnown* methods).

string requestAuthority = null;
if (scan.GetKnownUriScheme(out requestUriScheme))
{
// Request URIs can be in absolute form
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example

Log.IsEnabled(LogLevel.Information) ? start.GetAsciiStringEscaped(end, MaxInvalidRequestLineChars) : string.Empty);
}

// TODO consider handling unicode host names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTF-8

@natemcmaster
Copy link
Contributor Author

🆙 📅

var chNext = scan.Peek();
if (chNext == ByteForwardSlash || chNext == ByteSpace)
{
// an absolute URI is not required to end in a slash but host must not be empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this comment outside the if

@@ -428,59 +429,44 @@ public void CanUpgradeRequestWithConnectionKeepAliveUpgradeHeader()
[InlineData("http://localhost/", true, "/")]
[InlineData("https://localhost/", true, "/")]
[InlineData("http://localhost", true, "")]
// bad requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a separate test method for bad requests. Also, you can add those to the BadHttpRequestTests class.

"Connection: close\r\n" +
"\r\n";
await connection.SendAll(
$"GET {requestUrl} HTTP/1.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

@natemcmaster
Copy link
Contributor Author

This implementation ignores the authority section altogether, which means some malformed URI's could still be accepted. Is this okay? e.g. GET http://:80/abc HTTP/1.1 would be accepted.

Or do should we do a safe but slow System.Uri check to make sure the full URI is valid?

begin = scan;

var targetIsAbsolute = false;
string requestUriScheme;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not sure why you're creating these strings only to rebuild the rawTarget later. why not capture the full rawTarget all at once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update

return false;
}

if ((value & _mask7Chars) == _httpSchemeLong)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the spec allow for ws:// or wss:// (WebSockets)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec is ambiguous. It only says absolute-form = absolute-URI. Doesn't say anything about needing to ensure the scheme is http or https. What does websocket clients normally send?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 6455 section 4.1 states that websockets clients are required to send the initial HTTP request using 'origin-form' (normal request format), not 'absolute-form'. I don't think we have a valid client scenario for a request like GET ws://host/path HTTP/1.1 so we're probably okay rejecting this format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's not a valid client scenario to send absolute-form targets to the end-server, right? It's only meant to be sent to proxies AFAIK. I recall someone mentioning the spec requires that proxies change the target to origin-form, but end-servers still support absolute-form in case the proxy isn't compliant with the spec.

If we already going through this work to support buggy proxies, I think we should also allow ws/wss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make a distinction here, we support accepting absolute-form b/c the spec requires it and some http clients may send it. I'm in favor of rejecting anything with 'ws://' or 'wss:// because that is a buggy WebSockets client.

@Tratcher
Copy link
Member

Tratcher commented Dec 6, 2016

So long as we're not acting on the unvalidated values it should be fine.

[InlineData("GET http:// HTTP/1.1\r\n")]
[InlineData("GET https:// HTTP/1.1\r\n")]
[InlineData("GET http:/// HTTP/1.1\r\n")]
[InlineData("GET http://// HTTP/1.1\r\n")]
Copy link
Contributor

@cesarblum cesarblum Dec 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

  • Not starting with http:// or https://
  • http
  • https
  • http:
  • https:/
  • http:/
  • https:/
  • http://<space>
  • https://<space>
  • http:// /
  • https:// /

[InlineData("HtTp:// ", false, null)]
[InlineData("https://", true, MemoryPoolIteratorExtensions.HttpsScheme)]
[InlineData("https://abc", true, MemoryPoolIteratorExtensions.HttpsScheme)]
public void GetKnownHttpSchema(string input, bool expectedResult, string expectedString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Gets

[InlineData("https://differenthost/abs/path", "/abs/path")] // handles mismatched hostname
[InlineData("http://localhost/", "/")]
[InlineData("https://localhost/", "/")]
[InlineData("http://localhost", "")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide things like http://:80 are ok, you should add it here.

@@ -52,12 +52,26 @@ public class BadHttpRequestTests
[InlineData("} / HTTP/1.0\r\n")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test somewhere for OPTIONS * HTTP/1.1? Path should be empty and RawTarget should be *.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: KestrelTests.RequestTargetProcessingTests.NonPathRequestTargetSetInRawTarget.

if (targetBegin.Peek() != ByteForwardSlash)
{
string requestUriScheme;
if (scan.GetKnownHttpSchema(out requestUriScheme))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what we discussed, I don't think you need this method anymore - we're not really looking for any particular schemas, since we don't do anything about them. We do want to make sure it's a valid URI, otherwise we could just accept any kind of <garbage>/a/b/c, and that wouldn't make sense.

So here you could check if there isn't a /, like you're doing, and if there isn't this is the slow path and you grab the string from targetBegin to the slash and check if that's a valid URI (like you did below). Otherwise you're in the hot path and you should do things as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetKnownHttpSchema skips an allocation for the Scheme string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as Ben said, one goal was to minimize allocations.

Also, AFAIK there are no valid scenarios using absolute-form and schemes other than http:// andhttps://. Kestrel should reject requests like GET otherscheme://localhost/ HTTP/1.1 as invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheme string wouldn't need to be allocated even if we supported unknown schema. The out parameter could be the length of the schema string since the schema from the target isn't stored anywhere.

Better yet, this could be like GetKnownVersion and advance the iterator itself. I see that GetKnownMethod doesn't advance the iterator. I would change that too for consistency.

@@ -1021,8 +1025,34 @@ public RequestLineStatus TakeStartLine(SocketInput input)
scan.Skip(method.Length);
}

scan.Take();
scan.Take(); // consume space
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Put comment above line.

if (scan.Seek(ByteForwardSlash, ByteSpace, ref end) == -1)
{
RejectRequest(RequestRejectionReason.InvalidRequestLine,
Log.IsEnabled(LogLevel.Information) ? start.GetAsciiStringEscaped(end, MaxInvalidRequestLineChars) : string.Empty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Make a helper method for RejectRequest(RequestRejectionReason.InvalidRequestLine, startLineString) that takes the start and end iterators instead of a string so we don't see the ternary expression all over this method.

@@ -1058,13 +1088,15 @@ public RequestLineStatus TakeStartLine(SocketInput input)

var queryEnd = scan;

if (pathBegin.Peek() == ByteSpace)
if (pathBegin.Peek() == ByteSpace && targetBegin.Index == pathBegin.Index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the value of Path if the start line is GET http://www.example.org HTTP/1.1? string.Empty?

This seems weird since GET HTTP/1.1 is invalid. I think we should set Path to / or reject the request. Not sure which.

scan.Take();
scan.Take(); // consume space

// begin consuming HTTP-version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new comments. Same nit about them being on their own line.

@@ -1103,7 +1135,7 @@ public RequestLineStatus TakeStartLine(SocketInput input)
if (needDecode)
{
// Read raw target before mutating memory.
rawTarget = pathBegin.GetAsciiString(ref queryEnd);
rawTarget = targetBegin.GetAsciiString(ref queryEnd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there's a % in the host portion of the absolute target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC there RawTarget has always been the ASCII representation. Since we ignore the authority portion of the absolute-URI, we never bother to look for a %

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% doesn't really apply to hosts anyways, they prefer punycode.

}
else if (rawTarget[0] == '/') // check rawTarget since normalizedTarget can be "" or "/" after dot segment removal
else if (requestUrlPath?.Length > 0 && requestUrlPath[0] == '/') // check requestUrlPath since normalizedUrlPath can be "" or "/" after dot segment removal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestUrlPath can't be null, right? Like I mentioned in another comment, I don't like allowing requestUrlPath to ever be empty either.

I hadn't considered dot segment removal before though. I think if the dot segment removal ever causes the path to go past the root, e.g. /../, I would rather reject the request rather than set Path and QueryString to string.Empty.

return false;
}

if ((value & _mask7Chars) == _httpSchemeLong)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's not a valid client scenario to send absolute-form targets to the end-server, right? It's only meant to be sent to proxies AFAIK. I recall someone mentioning the spec requires that proxies change the target to origin-form, but end-servers still support absolute-form in case the proxy isn't compliant with the spec.

If we already going through this work to support buggy proxies, I think we should also allow ws/wss.

[InlineData("http://[email protected]/path", "/path")]
[InlineData("http://root:[email protected]/path", "/path")]
[InlineData("https://localhost/", "/")]
[InlineData("http://localhost", "")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you added a test for this. Don't like the result 😛

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, path should not be empty. reject or set it to /

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm favor of keeping it and setting to "/". The HTTP spec says is absolute-form = absolute-URI, and absolute URI's don't require the trailing slash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

if (targetBegin.Peek() != ByteForwardSlash)
{
string requestUriScheme;
if (scan.GetKnownHttpSchema(out requestUriScheme))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheme string wouldn't need to be allocated even if we supported unknown schema. The out parameter could be the length of the schema string since the schema from the target isn't stored anywhere.

Better yet, this could be like GetKnownVersion and advance the iterator itself. I see that GetKnownMethod doesn't advance the iterator. I would change that too for consistency.

}

begin = scan;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else reject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we don't want to reject yet. e.g. 'asterisk-form'. is still okay.

Copy link
Contributor

@cesarblum cesarblum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits and some comments on tests. After that and TODOs we're good to go.

// Start-line can contain uri in absolute form. e.g. 'GET http://contoso.com/favicon.ico HTTP/1.1'
// Clients should only send this to proxies, but the spec requires we handle it anyways.
// cref https://tools.ietf.org/html/rfc7230#section-5.3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this blank line

// validation of absolute-form URI may be slow, but clients
// should not be sending this form anyways, so perf optimization
// not high priority

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this blank line

[InlineData("http://localhost", "/")]
[InlineData("http://127.0.0.1/", "/")]
[InlineData("http://[::1]/", "/")]
[InlineData("http://[::1]:8080/", "/")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add cases with:

  • Query string
  • Percent encoding
  • Both

{
using (var connection = server.CreateConnection())
{
await connection.SendAll(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Send

"Host: localhost",
"",
"");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection.Receive expected response

[InlineData("GET / / HTTP/1.1\r\n")]
[InlineData("GET http:// HTTP/1.1\r\n")]
[InlineData("GET https:// HTTP/1.1\r\n")]
[InlineData("GET https:// / HTTP/1.1\r\n")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these here? The version is valid in the new test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are here because there is a space or a slash where the HTTP version should begin.

[InlineData("http://[::1]", "/")]
[InlineData("http://[::1]/path", "/path")]
[InlineData("http://[::1]:8080/", "/")]
[InlineData("http://user@[::1]:8080/", "/")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add cases with query string and percent encoding here too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you miss this one?

[InlineData("http://abc", true, MemoryPoolIteratorExtensions.HttpScheme)]
[InlineData("http:// ", true, MemoryPoolIteratorExtensions.HttpScheme)]
[InlineData("http:/ ", false, null)]
[InlineData("HtTp:// ", false, null)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add wrong casing tests for https too.

[InlineData("http:/ ", false, null)]
[InlineData("HtTp:// ", false, null)]
[InlineData("https://", true, MemoryPoolIteratorExtensions.HttpsScheme)]
[InlineData("https://abc", true, MemoryPoolIteratorExtensions.HttpsScheme)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test same conditions on http and https. You're not testing https:/ nor followed by space.

For both: http: and https:

@natemcmaster
Copy link
Contributor Author

🆙 📅

if ((value & _mask7Chars) == _httpSchemeLong)
{
knownScheme = HttpScheme;
begin.Skip(knownScheme.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doing anything? Isn't begin a copied struct that isn't used at all afterwards?

/// <param name="knownScheme">A reference to the known scheme, if the input matches any</param>
/// <returns>True when memory starts with known http or https schema</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool GetKnownHttpSchema(this MemoryPoolIterator begin, out string knownScheme)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the out parameter ever used? Would HasKnownHttpSchema be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the out param is used.

// TODO invalidate requests that don't meet the HTTP spec. See https://github.com/aspnet/KestrelHttpServer/issues/1279
//[InlineData("GET otherscheme://host/ HTTP/1.1\r\n")]
//[InlineData("GET ws://host/ HTTP/1.1\r\n")]
//[InlineData("GET wss://host/ HTTP/1.1\r\n")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an absolute-form request with a non HTTP(S) scheme, does Path become string.Empty since it doesn't start with '/' while not being rejected?

@natemcmaster
Copy link
Contributor Author

natemcmaster commented Jan 7, 2017 via email

@natemcmaster
Copy link
Contributor Author

natemcmaster commented Jan 7, 2017 via email

An absolute-form request URI has a start line in form: "GET http://host/path HTTP/1.1".
RFC 7230 section 5.3.2 stipulates that servers should allow absolute-form request URIs.

This change will handles requests using absolute-form. The scheme and authority
section of the absolute URI are ignored, but will still appear in IHttpRequestFeature.RawTarget.

Resolves #666
@cesarblum
Copy link
Contributor

@natemcmaster Can't you skip inside the method itself?

@natemcmaster
Copy link
Contributor Author

cc @pakrym this probably overlaps with work you are doing. Ok if I put this in before? Or would it be better to hold off?

@natemcmaster
Copy link
Contributor Author

In person discussion. Closing this for now as we expect large merge conflicts with work @pakrym is doing, and fixing the bug that this PR addresses is lower priority. We'll come back to this before 2.0.0, after @pakrym's refactorings.

@natemcmaster natemcmaster deleted the namc/fix666 branch March 13, 2017 19:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants