Use unmanaged memory directly in e.g. MemoryPool2 instead of pinned/fixed across the source #647

nietras · 2016-02-20T13:05:37Z

It seems to me that the memory pool really should be based on unmanaged memory since it is pinned/fixed anyway when used with libuv or similar. Changing this should not only reduce the cross-cutting duplication of pinned/fixed statements but perhaps also improve perf.

Asking in a different way, why insist on using managed byte arrays when these are used in a fixed manner?

benaadams · 2016-02-20T13:14:00Z

The memory pool is pinned in the LoH so not in regular generational memory. They have only one pinned statement for the lifetime of the application.

On the other hand using unmanaged memory would require marshalling to and from they byte arrays for regular operations and also not allow the use of Vectorized operations on them, currently.

benaadams · 2016-02-20T13:19:21Z

As a comparison the RIO C# server shown in the benchmark results (also this PR aspnet/Benchmarks#12) is entirely managed memory.

nietras · 2016-02-20T13:19:55Z

Yes while I do understand that it is allocated on the LoH and therefore pinning isn't a huge issue. I have seen some Pin() statements in several places and also fixed statements. Couldn't the regular operations on byte arrays be replaced?

There is no need for marshalling if this change (perhaps huge) would be cross-cutting throughout. Libuv expects IntPtr pointers and fixed memory. In addition Vector operations ARE possible right now using https://github.com/DotNetCross/Memory.Unsafe and these should in fact become available in the runtime at some point. I think they said it would have to wait after RC2 but now that this has been moved this could still be put into it. And wouldn't unsafe Vector read/write help with perf too?

nietras · 2016-02-20T13:24:42Z

As far as I understand RIO has a similar API surface as libuv in that it expects fixed/pinned memory so my assumption would be it would help in this case too. E.g. lowering CPU load. And yes I am thinking of the stuff going on in e.g. MemoryPoolIterator2 and related code.

aL3891 · 2016-02-20T13:41:56Z

If you're interested i've got a managed rio library going at https://github.com/aL3891/RioSharp that does use an unmanaged memory pool. It also has an experimental web server and the beginnings of an asp.net core host, but its o hold right now because of the current lack of VS tooling for the cli. I do plan to pick up the asp.net host once RC2 is out

benaadams · 2016-02-20T13:44:36Z

Pretty much all of the fixed statements can be removed from operations on the MemoryBlock2s though it requires some minor refactoring of the data type to directly expose the pointer and well as some of the code paths.

They sill exist because at one time the MemoryBlocks could accept non-pooled memory; as you can see by looking at their Pin and Unpin methods

nietras · 2016-02-20T14:08:52Z

fixed statements can be removed from operations on the MemoryBlock2

Yeah I was wondering about why these where even there, when the pool is pinned. Since these seem to incur a bounds check on the array due to indexing there would be some minor improvements there. The same bounds checks for Vector could be removed as well if using pointers directly.

Is there any existing easy way to run kestrel with profiler to do testing of this on a single PC?

Yes, it is the Pin/Unpin stuff that seemed out of place or inefficient at the least too me. However, since the pool is pinned and memory block is only supposed to use pooled memory this could be removed, right?

benaadams · 2016-02-20T14:47:18Z

The same bounds checks for Vector could be removed as well if using pointers directly.

Need to wait for https://github.com/dotnet/corefx/issues/5474

However, since the pool is pinned and memory block is only supposed to use pooled memory this could be removed, right?

Yes, can do something like this benaadams@686ad9d

Is there any existing easy way to run kestrel with profiler to do testing of this on a single PC?

Alas; while its easy to get memory profiles, with various tools like loadtest - you need to use wrk on a different machine with a > 1 GBps connection between the two to generate a high enough load.

For my testing I use two Azure G4 machines; one Linux; one Windows and they generate enough load between them (can get > 12GBps with large responses with this set up)

benaadams · 2016-02-20T14:54:46Z

Gap in the market for a good WIndows http perf tool that is multi-threaded, fast loopback and does pipelining ;-)

nietras · 2016-02-20T15:03:12Z

Need to wait for dotnet/corefx#5474

You can use Unsafe from the open source package https://www.nuget.org/packages/DotNetCross.Memory.Unsafe.dll/ right now. Any reason why this cannot be used?

Alas; while its easy to get memory profiles, with various tools like loadtest - you need to use wrk

Hmm, can micro-benchmarking the pool be used or are the load patterns very different so these won't necessarily show this?

There doesn't seem to be any perf tests as part of the kestrel source, wouldn't these add some value and allow for easier contribution?

Yeah, out-of-box experience for these things in dotnet are not good ;) In fact out-of-box for the aspnet benchmark repo is pretty bad too, it just doesn't work right now. A .NET Core perf tool would be great.

benaadams · 2016-02-20T15:16:59Z

The more general tests are https://github.com/aspnet/Performance

For specific micro-benchmarking I recommend https://github.com/PerfDotNet/BenchmarkDotNet

@mattwarren gives some good insight in to recent features http://mattwarren.github.io/2016/02/17/adventures-in-benchmarking-memory-allocations/

benaadams · 2016-02-20T16:56:36Z

Any reason why this cannot be used?

I'd assume issues around taking a dependency around an unknown, low contributor and not well supported package; which then MS has to take the onus on provide security and support for - how would they provide emergency patching for it in case of a security alert if you were on holiday so they couldn't update the source that the dependency was based on? Would be messy...

There doesn't seem to be any perf tests as part of the kestrel source, wouldn't these add some value and allow for easier contribution? Yeah, out-of-box experience for these things in dotnet are not good

Again I assume this would be an area that would be improved post RTM; as Kestrel performance is good, and there is more need for stabilising features currently.

As an aside, normally when I make a performance improvement PR to make it easier to evaluate, I'll include a gist of the micro-benchmark and the measurements from it, for example #498 (comment) and #634 (comment) (e.g. 2nd one non-included); if they show good results then they can be investigated further.

If its a system wide-change then I'll do a before and after wrk test #494 (comment) and for memory reductions I'd provide rational and before and after analysis #608 (comment)

nietras · 2016-02-20T17:13:23Z

Thanks for the reply again.

I know and have used BenchmarkDotNet, and it is definitely useful for determining the performance characteristics of different pieces of code in isolation. It is a work in progress, though, I have had issues with this not working with referenced assemblies etc. Also it is very conservative so for a quick feedback loop it is too cumbersome in my view, but once some different approaches need a definite answer then yes it is great. Just hoping it will have a better OOB experience too.

Yes, I have seen your micro-benchmark and read them with great interest to see the tricks used. Which is why I am also following the development of kestrel. Its great to have this out in the open.

I was rather looking for some good and easy out-of-the-box "F5" experience with regard to kestrel perf work, e.g. run dotnet restore, dotnet build, open solution press F5, and then run a simple command available via nuget that has been restored automatically to do some quick load testing. Nothing to fancy or something that gives absolute performance numbers, but something that can be used for profiling and seeing where most work or GC pressure comes from.

OOBE for dotnet repos is terrible as far as I can tell. I have not been able to run build.cmd on many that actually work i.e. have no errors or test failures (this includes the benchmark, performance, kestrel repos ... so I guess it points more and more to a problem on my machine). Anyway, this pretty much discourages contribution. I don't want to spent hours trying to get stuff working or installing a linux load testing tool in a virtual box etc.

Sorry, that was a rant.

issues around taking a dependency around an unknown, low contributor and not well supported package; which then MS has to take the onus on provide security and support for

I understand that very well, but before RTM and before Unsafe is added to the BCL this could be used in the mean time. If there is a performance benefit, this would also put pressure on actually getting Unsafe included in e.g. RC2.

benaadams · 2016-02-20T17:23:29Z

OOBE for dotnet repos is terrible as far as I can tell.

Its generally good - its just a bad time; the move to the dotnet cli has made things quite unstable - so much so that the release date for RC2 has moved to TBD https://github.com/aspnet/Home/wiki/Roadmap

It should be better again when that settles down.

aL3891 · 2016-02-20T18:23:24Z

Gap in the market for a good WIndows http perf tool that is multi-threaded, fast loopback and does pipelining ;-)

Fyi Riosharp has one, it has the same cli interface as wrk, its still very limited though, it can only do gets and only count the number of requests but handy if you want to generate a lot of load on windows

benaadams · 2016-02-20T19:29:03Z

@aL3891 RioSharp.BenchClient ? Interesting will be checking it out!

mattwarren · 2016-02-20T23:32:34Z

I know and have used BenchmarkDotNet, and it is definitely useful for determining the performance characteristics of different pieces of code in isolation. It is a work in progress, though, I have had issues with this not working with referenced assemblies etc.

Hopefully we've fixed these issues now, but if you are still having problems please raise an issue and we will take a look. Hopefully it's now less "work in progress" and heading towards something more complete/stable.

Also it is very conservative so for a quick feedback loop it is too cumbersome in my view, but once some different approaches need a definite answer then yes it is great.

We try to favour accurate results over quickness. Having said that, maybe we were too conservative.

The latest release has an improved core that tries to address this issue, i.e. we will still give accurate results, but get them as fast as possible. Also you can override the defaults and force benchmarks to do less iterations if you want to

Just hoping it will have a better OOB experience too.

Any specifics here? Instructions on writing Benchmarks, how to run them, available features or something else? FWIW our readme page on GitHub (https://github.com/PerfDotNet/BenchmarkDotNet/) has had an overhaul, it now begins with a step-by-step guide.

Full disclaimer, I'm one of the authors, so I'm probably biased :-)

On Saturday, 20 February 2016, nietras [email protected] wrote:

Thanks for the reply again.

I know and have used BenchmarkDotNet, and it is definitely useful for
determining the performance characteristics of different pieces of code in
isolation. It is a work in progress, though, I have had issues with this
not working with referenced assemblies etc. Also it is very conservative so
for a quick feedback loop it is too cumbersome in my view, but once some
different approaches need a definite answer then yes it is great. Just
hoping it will have a better OOB experience too.

Yes, I have seen your micro-benchmark and read them with great interest to
see the tricks used. Which is why I am also following the development of
kestrel. Its great to have this out in the open.

I was rather looking for some good and easy out-of-the-box "F5" experience
with regard to kestrel perf work, e.g. run dotnet restore, dotnet build,
open solution press F5, and then run a simple command available via nuget
that has been restored automatically to do some quick load testing. Nothing
to fancy or something that gives absolute performance numbers, but
something that can be used for profiling and seeing where most work or GC
pressure comes from.

OOBE for dotnet repos is terrible as far as I can tell. I have not been
able to run build.cmd on many that actually work i.e. have no errors or
test failures (this includes the benchmark, performance, kestrel repos ...
so I guess it points more and more to a problem on my machine). Anyway,
this pretty much discourages contribution. I don't want to spent hours
trying to get stuff working or installing a linux load testing tool in a
virtual box etc.

Sorry, that was a rant.

issues around taking a dependency around an unknown, low contributor and
not well supported package; which then MS has to take the onus on provide
security and support for

I understand that very well, but before RTM and before Unsafe is added to
the BCL this could be used in the mean time. If there is a performance
benefit, this would also put pressure on actually getting Unsafe included
in e.g. RC2.

—
Reply to this email directly or view it on GitHub
#647 (comment)
.

sajayantony · 2016-02-22T19:31:06Z

@nietras do you have a profile screenshot. I would like to take this up with the CLR if the cost of fixed/pinned is actually showing up.

nietras · 2016-02-24T11:33:36Z

@mattwarren I'm one of the authors

I know! I follow your blog too, great work. Thanks for replying.

issues with this not working with referenced assemblies etc.

As a good digital citizen I have in fact filed an issue dotnet/BenchmarkDotNet#72 which hasn't been marked as resolved. This, therefore, also means I have not tried the latest version.

Any specifics here? Instructions on writing Benchmarks

My biggest issue I guess is that BenchmarkDotNet does not come with memory profiling as default and with it enabled by default. This could be as simple as a minimum to add GC collection counts for the different generations (this is pretty easy to measure and does not require external things). Additionally, diagnosers are as far as I can tell still not available as a nuget package. There is very little info on using and setting up diagnosers, again specifically for memory benchmarking but also something I think has great value actual machine assembly code output.

@sajayantony sorry I do not have any particular profiling data since I pretty much got stuck with just trying to get stuff working. And it is not really the fixing itself I am so much "worried" about, it is actually the unnecessary bounds checking and code size. Since the code has been optimized so much I think trying to minimize this minor things could help perf. And some of these things will not show up in micro-benchmarking (like code size) but only in somewhat realistic full testing scenarios. Which I wanted to try to get a baseline for first. But micro-benchmarking could be the first step. Not sure when I will have time to look at this, though.

mattwarren · 2016-02-25T11:10:15Z

My biggest issue I guess is that BenchmarkDotNet does not come with memory profiling as default and with it enabled by default. This could be as simple as a minimum to add GC collection counts for the different generations (this is pretty easy to measure and does not require external things). Additionally, diagnosers are as far as I can tell still not available as a nuget package. There is very little info on using and setting up diagnosers, again specifically for memory benchmarking but also something I think has great value actual machine assembly code output.

So I don't hijack this thread any more, I opened an issue on BenchmarkDotNet to discuss this

mattwarren mentioned this issue Feb 25, 2016

Discuss and track points raised by @nietras dotnet/BenchmarkDotNet#99

Closed

5 tasks

muratg added this to the Backlog milestone Mar 14, 2016

veikkoeeva mentioned this issue Mar 16, 2016

Orleans sockets, would there be need for more performance? dotnet/orleans#307

Closed

muratg modified the milestones: Discussions, Backlog Jul 25, 2016

nietras closed this as completed Dec 4, 2016

Use unmanaged memory directly in e.g. MemoryPool2 instead of pinned/fixed across the source #647

Use unmanaged memory directly in e.g. MemoryPool2 instead of pinned/fixed across the source #647

Comments

nietras commented Feb 20, 2016

benaadams commented Feb 20, 2016

Uh oh!

benaadams commented Feb 20, 2016

Uh oh!

nietras commented Feb 20, 2016

Uh oh!

nietras commented Feb 20, 2016

Uh oh!

aL3891 commented Feb 20, 2016

Uh oh!

benaadams commented Feb 20, 2016

Uh oh!

nietras commented Feb 20, 2016

Uh oh!

benaadams commented Feb 20, 2016

Uh oh!

benaadams commented Feb 20, 2016

Uh oh!

nietras commented Feb 20, 2016

Uh oh!

benaadams commented Feb 20, 2016

Uh oh!

benaadams commented Feb 20, 2016

Uh oh!

nietras commented Feb 20, 2016

Uh oh!

benaadams commented Feb 20, 2016

Uh oh!

aL3891 commented Feb 20, 2016

Uh oh!

benaadams commented Feb 20, 2016

Uh oh!

mattwarren commented Feb 20, 2016

Uh oh!

sajayantony commented Feb 22, 2016

Uh oh!

nietras commented Feb 24, 2016

Uh oh!

mattwarren commented Feb 25, 2016

Uh oh!