Skip to content

Managed RIO server #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 25 commits into from
Closed

Conversation

benaadams
Copy link
Contributor

#9 Managed Registered IO Server, using IOCP

@DamianEdwards
Copy link
Member

Performance looks great. We're seeing between 5 and 5.9 million RPS with wrk -c 256 -t 32 and pipelining at depth of 11. Any change to those parameters in either direction and it drops. We have CPU to spare still , between 50% and 30% (remaining).

Immediate issue appears to be that pipelining above depth of 11 results in numbers going down. Our hunch is that there's an issue when reading request payloads across datagrams. Would be great if you could look into that because we still have CPU spare and I'd love to see this thing max it 😄

We'd love to see the "same" server written without RIO (just usual WinSock) to compare the difference, as right now we only have our libuv based servers to compare with.

@benaadams
Copy link
Contributor Author

Immediate issue appears to be that pipelining above depth of 11 results in numbers going down.

Do you know request byte size or is it variable? If you could post a list of 15 that would be helpful e.g.

GET / HTTP/1.0
Connection: Keep-Alive
Host: 127.0.0.1:5000
User-Agent: Test
Accept: */*

GET / HTTP/1.0
Connection: Keep-Alive
Host: 127.0.0.1:5000
User-Agent: Test
Accept: */*

GET / HTTP/1.0
Connection: Keep-Alive
Host: 127.0.0.1:5000
User-Agent: Test
Accept: */*

GET / HTTP/1.0
Connection: Keep-Alive
Host: 127.0.0.1:5000
User-Agent: Test
Accept: */*

GET / HTTP/1.0
Connection: Keep-Alive
Host: 127.0.0.1:5000
User-Agent: Test
Accept: */*

GET / HTTP/1.0
Connection: Keep-Alive
Host: 127.0.0.1:5000
User-Agent: Test
Accept: */*

GET / HTTP/1.0
Connection: Keep-Alive
Host: 127.0.0.1:5000
User-Agent: Test
Accept: */*

GET / HTTP/1.0
Connection: Keep-Alive
Host: 127.0.0.1:5000
User-Agent: Test
Accept: */*

etc...

Can change Host etc as long as character text count is same; will be easier to track down in debugger.

@benaadams
Copy link
Contributor Author

We'd love to see the "same" server written without RIO (just usual WinSock) to compare the difference

Shouldn't be too hard (famous last words)

@benaadams
Copy link
Contributor Author

Should have fixed pipelining

@benaadams
Copy link
Contributor Author

Something weird still happening at 11; still looking

@DamianEdwards
Copy link
Member

OK. I ran it again. Got ~6.7 million RPS at 45% CPU utilization & 7.6Gbps, So we have both CPU and network to spare 😄

@benaadams
Copy link
Contributor Author

Request packets shouldn't fragment till past 26+ pipeline depending on Host header size
Response packets should fragment on 15 as only 1460 bytes + 40 tcp/ip header

So 11 is really weird...

Digging the code; you don't have 2 wrk servers on the 10Gps to hit it with? Maybe wrk breaks? (though is more likely my code)

@DamianEdwards
Copy link
Member

I can hit it with two wrk machines to see if that makes a difference.

@benaadams
Copy link
Contributor Author

Thinking aloud

In theory should be able to pipeline 29 deep without the request fragmenting; and get a response back in 2 packets with 20 bytes spare; for cool network saturation. Though would need to find the sweet spot for each load server on that pipelining for the threads.

Code isn't breaking on any numbers that make sense; though I'm getting the peak between an Azure Windows VM and Azure Linux VM at 11 - so that would suggest its the code as your test system and my Azure one are different environments.

Maybe have buffer size mis-matches that like 1100 bytes but not 1200+ bytes... hmm...

@benaadams
Copy link
Contributor Author

Request fragment size is assuming a pipelining path of

 r[1] = wrk.format(nil, "/")

@DamianEdwards
Copy link
Member

Yep, that's what our pipeline script is doing

@benaadams
Copy link
Contributor Author

Quadrupled RIO event queue read size; just in case events were getting dropped

@benaadams
Copy link
Contributor Author

Wow; please tell me you don't have Receive Side Scaling switched on - then go switch it on:

capture

My speed shot through the roof...

@benaadams
Copy link
Contributor Author

@benaadams
Copy link
Contributor Author

Probably want the newer code too as I was doing shortcuts on the reading

@DamianEdwards
Copy link
Member

Sorry man, RSS was already on (we chose the "Web Server" profile in the Intel NIC settings (lots of knobs and dials we could change, profile named "Web Server" seemed like a good fit 😉).

Trying new code now.

@DamianEdwards
Copy link
Member

New code is about the same as before.

@DamianEdwards
Copy link
Member

I played around with the wrk parameters a bit and just broke 7 million RPS for the first time: wrk -c 450 -t 32 -s ./scripts/pipeline.lua http://10.0.100:5000 --100

Yep, I pipelined at depth 100 😄

CPU gets lower as pipeline depth grows, which I think makes sense as network reads become more efficient (less datagrams with only part of a request).

@benaadams
Copy link
Contributor Author

Also would break the read->write dependency in the same way upping the connections will as request packets can be in flight while responses are being generated; whereas in the single packet request->response->request is serialised.

Definitely sounds like a bug somewhere, however, hard to test as it happily crushes any network I put it on 😆

@benaadams
Copy link
Contributor Author

CPU gets lower as pipeline depth grows, which I think makes sense as network reads become more efficient (less datagrams with only part of a request).

I think that's also an effect of RIO as it sends receives in batches rather than per packet; so if 5 packets come in at same time it will send 1 IOCP message rather than 5 IOCP messages in the more traditional winsock.

Added dynamic Date: header and changed response string to "Hello,
World!"
@benaadams
Copy link
Contributor Author

Since there is spare CPU & network, added changes so it would pass the TechEmpower Test type 6: Plaintext at 256, 1024, 4096 concurrency; specifically:

Body value

The response body must be Hello, World!

Compose headers and body:

the response must be fully composed from this and its headers within the scope of each request and it is not acceptable to store the entire payload of the response, headers inclusive, as a pre-rendered buffer.

Add Date header; that is current date in RFC1123 format

The response headers must include Server and Date.

Also tweaked final send in a batch; not sure if it will do anything; but since there is CPU to play with...

Not sure if 16,384 concurrency would break it but probably; there are some allocations that are function of # threads which might need tweaking for that; but trying to run with that concurrency breaks my wrk server.

Response payload is x1.5 bigger; and has some dynamic elements (Date) so may perform worse; but the response end tweak might help?

@benaadams
Copy link
Contributor Author

Think I found the other bug...

@benaadams
Copy link
Contributor Author

Should work better

@benaadams
Copy link
Contributor Author

Should also pass the 16,384 concurrency.test on TechEmpower's 40 core enviorment

seem to exhaust them in some tests when exact amount; currently over
provided by 4 time. only seem to need 4.
@DamianEdwards
Copy link
Member

We moved things around in the repo. Can you rebase and move your project to the experimental folder please?

@benaadams benaadams mentioned this pull request Jul 22, 2015
@benaadams
Copy link
Contributor Author

Beyond my git ken have created another PR

@benaadams benaadams closed this Jul 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants