Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Precomputed header bytes #367

Merged
merged 5 commits into from
Nov 23, 2015
Merged

Precomputed header bytes #367

merged 5 commits into from
Nov 23, 2015

Conversation

benaadams
Copy link
Contributor

Encode all the standard preamble once for all requests

@benaadams
Copy link
Contributor Author

705,348.96 rps -> 1,030,679.42 rps

/cc @davidfowl @halter73

@rynowak
Copy link
Member

rynowak commented Nov 11, 2015

That's some SRS RPS 👍

@benaadams
Copy link
Contributor Author

1,069,312.35 on the 3rd wrk run (with Kestrel warmed up)

@benaadams
Copy link
Contributor Author

If SocketOutput used MemoryPool2 internally and there was a Write(+Async) overload that took (MemoryPoolBlock2 block, int length, ...) and which took ownership of the pooled block (and Released it at write callback); then the copy step at the start of public Task WriteAsync could be skipped for the header preamble.

@AspNetSmurfLab
Copy link

New numbers from SmurfLab with this PR, up from ~497k previously:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.36ms   23.70ms 316.86ms   93.90%
    Req/Sec    18.86k     3.02k   35.23k    77.74%
  5912136 requests in 10.10s, 744.25MB read
  Socket errors: connect 0, read 0, write 337, timeout 0
Requests/sec: 585370.49
Transfer/sec:     73.69MB

@benaadams
Copy link
Contributor Author

Try non pipelined? I'm also seeing an improvement there.

@benaadams
Copy link
Contributor Author

Added precomputing Http Version bytes also

@AspNetSmurfLab
Copy link

Wow. With the addition of 371b2ad we break 600K:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.26ms   25.12ms 265.05ms   92.61%
    Req/Sec    19.57k     3.08k   43.89k    82.55%
  6115424 requests in 10.10s, 769.84MB read
  Socket errors: connect 0, read 0, write 300, timeout 0
Requests/sec: 605502.09
Transfer/sec:     76.22MB

@AspNetSmurfLab
Copy link

And non-pipelined results:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.52ms    1.35ms  56.63ms   96.06%
    Req/Sec     5.68k   667.38     8.41k    83.08%
  1778313 requests in 10.10s, 223.86MB read
  Socket errors: connect 0, read 0, write 552, timeout 0
Requests/sec: 176086.22
Transfer/sec:     22.17MB

@benaadams
Copy link
Contributor Author

Have one last additional item for Server and Date; just need to sort out tests

@AspNetSmurfLab
Copy link

Latest results, getting higher:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.50ms   23.15ms 230.54ms   92.07%
    Req/Sec    19.83k     2.93k   35.72k    85.37%
  6204867 requests in 10.10s, 781.10MB read
  Socket errors: connect 0, read 0, write 587, timeout 0
Requests/sec: 614363.71
Transfer/sec:     77.34MB

@benaadams
Copy link
Contributor Author

Added Server and Date precomputation; while still allowing swap out/clearing

@benaadams benaadams force-pushed the faster-headers branch 15 times, most recently from 1111def to e9f1a8b Compare November 12, 2015 11:16
}
}

public void CopyFromAscii(byte[] data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called CopyFromAscii?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that's what you said? CopyAsciiFrom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh; just the string method?

Add ProducingStart and ProducingComplete methods to ISocketOutput.
These new methods can help prevent double buffering when encoding.
@benaadams
Copy link
Contributor Author

Rebased on #410 doing integration now

/cc @halter73

@lodejard
Copy link
Contributor

I just filed dotnet/extensions#64 we might want to consider.

It's related to this PR in that it would open a channel from app to server for pre-computed ascii byte[] response headers to be written.

@davidfowl
Copy link
Member

There's some duplication here with the CopyFrom logic. This looks much cleaner with @halter73 's changes

@@ -29,6 +29,17 @@ public partial class Frame : FrameContext, IFrameControl
private static readonly ArraySegment<byte> _emptyData = new ArraySegment<byte>(new byte[0]);
private static readonly byte[] _hex = Encoding.ASCII.GetBytes("0123456789abcdef");

private static readonly byte[] _bytesConnectionClose = Encoding.ASCII.GetBytes("\r\nConnection: close");
private static readonly byte[] _bytesConnectionKeepAlive = Encoding.ASCII.GetBytes("\r\nConnection: keep-alive");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move these out of frame into a static class? Similar to reason phrases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less clutter?

@benaadams
Copy link
Contributor Author

Seem to be spitting out the right length data but in lots of zero bytes; not sure I've integrated right.

count += end.CopyFrom(_httpVersion == HttpVersionType.Http1_1 ? _bytesHttpVersion1_1 : _bytesHttpVersion1_0);
count += end.CopyFrom(statusBytes);
count += _responseHeaders.CopyTo(ref end);
count += end.CopyFrom(_bytesEndHeaders, 0, _bytesEndHeaders.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I was considering making CopyFrom mutate the iterator. I wonder if CopyTo should also mutate for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to get the count for the count; though I have 27 failing tests I need to work out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? absolutely CopyFrom should mutate the iterator, doesn't it? CopyTo can be changed to match.

ProducingComplete should only take (end), or (begin,end), but not the (count). The SocketOutput knows what to write based on the original tail's iterator returned, and the complete iterator passed in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the count is for bytesPrecompted rather than traversing the iterator to work it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of the cases are where the entire response fits in one block, so having SocketOutput call begin.GetLength(end) will just subtract the index integers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The count is only to simplify write-behind buffering calculation. We do traverse the blocks and count how many bytes we're actually writing before calling uv_write.

@benaadams
Copy link
Contributor Author

Rebased on SocketOuput changes; fingers crossed for performance

@benaadams
Copy link
Contributor Author

Looks like it might be a good one, a lot of the allocations drop out

Before

pre

After

post

@benaadams
Copy link
Contributor Author

RPS isn't great; but CPU is low, so I think there is high lock contention - making some minor adjusts to move work out of the locks.

@benaadams
Copy link
Contributor Author

Reduced the in lock work; but also turned out I had receive side scaling turned off. When I turn that on... Oh my...

  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.67ms   34.29ms 657.95ms   90.26%
    Req/Sec    40.62k     4.81k   76.06k    77.62%
  77224953 requests in 1.00m, 9.49GB read
Requests/sec: 1284937.56
Transfer/sec:    161.75MB

Best I've seen in my environment yet

@halter73
Copy link
Member

Nice. I'm merging this in now. I plan to make a few more changes, like changing the signature of ProducingComplete to not take a count, but that can be done in another PR.

@halter73 halter73 merged commit 81dba39 into aspnet:dev Nov 23, 2015
@benaadams benaadams deleted the faster-headers branch November 23, 2015 23:19
@benaadams
Copy link
Contributor Author

🏁 time to rebase all the things...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants