-
Notifications
You must be signed in to change notification settings - Fork 523
Use unmanaged memory directly in e.g. MemoryPool2 instead of pinned/fixed across the source #647
Comments
The memory pool is pinned in the LoH so not in regular generational memory. They have only one pinned statement for the lifetime of the application. On the other hand using unmanaged memory would require marshalling to and from they byte arrays for regular operations and also not allow the use of Vectorized operations on them, currently. |
As a comparison the RIO C# server shown in the benchmark results (also this PR aspnet/Benchmarks#12) is entirely managed memory. |
Yes while I do understand that it is allocated on the LoH and therefore pinning isn't a huge issue. I have seen some There is no need for marshalling if this change (perhaps huge) would be cross-cutting throughout. Libuv expects |
As far as I understand RIO has a similar API surface as libuv in that it expects fixed/pinned memory so my assumption would be it would help in this case too. E.g. lowering CPU load. And yes I am thinking of the stuff going on in e.g. |
If you're interested i've got a managed rio library going at https://github.com/aL3891/RioSharp that does use an unmanaged memory pool. It also has an experimental web server and the beginnings of an asp.net core host, but its o hold right now because of the current lack of VS tooling for the cli. I do plan to pick up the asp.net host once RC2 is out |
Pretty much all of the They sill exist because at one time the |
Yeah I was wondering about why these where even there, when the pool is pinned. Since these seem to incur a bounds check on the array due to indexing there would be some minor improvements there. The same bounds checks for Vector could be removed as well if using pointers directly. Is there any existing easy way to run kestrel with profiler to do testing of this on a single PC? Yes, it is the |
Need to wait for https://github.com/dotnet/corefx/issues/5474
Yes, can do something like this benaadams@686ad9d
Alas; while its easy to get memory profiles, with various tools like loadtest - you need to use For my testing I use two Azure G4 machines; one Linux; one Windows and they generate enough load between them (can get > 12GBps with large responses with this set up) |
Gap in the market for a good WIndows http perf tool that is multi-threaded, fast loopback and does pipelining ;-) |
You can use
Hmm, can micro-benchmarking the pool be used or are the load patterns very different so these won't necessarily show this? There doesn't seem to be any perf tests as part of the kestrel source, wouldn't these add some value and allow for easier contribution? Yeah, out-of-box experience for these things in dotnet are not good ;) In fact out-of-box for the aspnet benchmark repo is pretty bad too, it just doesn't work right now. A .NET Core perf tool would be great. |
The more general tests are https://github.com/aspnet/Performance For specific micro-benchmarking I recommend https://github.com/PerfDotNet/BenchmarkDotNet @mattwarren gives some good insight in to recent features http://mattwarren.github.io/2016/02/17/adventures-in-benchmarking-memory-allocations/ |
I'd assume issues around taking a dependency around an unknown, low contributor and not well supported package; which then MS has to take the onus on provide security and support for - how would they provide emergency patching for it in case of a security alert if you were on holiday so they couldn't update the source that the dependency was based on? Would be messy...
Again I assume this would be an area that would be improved post RTM; as Kestrel performance is good, and there is more need for stabilising features currently. As an aside, normally when I make a performance improvement PR to make it easier to evaluate, I'll include a gist of the micro-benchmark and the measurements from it, for example #498 (comment) and #634 (comment) (e.g. 2nd one non-included); if they show good results then they can be investigated further. If its a system wide-change then I'll do a before and after |
Thanks for the reply again. I know and have used BenchmarkDotNet, and it is definitely useful for determining the performance characteristics of different pieces of code in isolation. It is a work in progress, though, I have had issues with this not working with referenced assemblies etc. Also it is very conservative so for a quick feedback loop it is too cumbersome in my view, but once some different approaches need a definite answer then yes it is great. Just hoping it will have a better OOB experience too. Yes, I have seen your micro-benchmark and read them with great interest to see the tricks used. Which is why I am also following the development of kestrel. Its great to have this out in the open. I was rather looking for some good and easy out-of-the-box "F5" experience with regard to kestrel perf work, e.g. run OOBE for dotnet repos is terrible as far as I can tell. I have not been able to run Sorry, that was a rant.
I understand that very well, but before RTM and before |
Its generally good - its just a bad time; the move to the dotnet cli has made things quite unstable - so much so that the release date for RC2 has moved to TBD https://github.com/aspnet/Home/wiki/Roadmap It should be better again when that settles down. |
Fyi Riosharp has one, it has the same cli interface as wrk, its still very limited though, it can only do gets and only count the number of requests but handy if you want to generate a lot of load on windows |
@aL3891 RioSharp.BenchClient ? Interesting will be checking it out! |
Hopefully we've fixed these issues now, but if you are still having problems please raise an issue and we will take a look. Hopefully it's now less "work in progress" and heading towards something more complete/stable.
We try to favour accurate results over quickness. Having said that, maybe we were too conservative. The latest release has an improved core that tries to address this issue, i.e. we will still give accurate results, but get them as fast as possible. Also you can override the defaults and force benchmarks to do less iterations if you want to
Any specifics here? Instructions on writing Benchmarks, how to run them, available features or something else? FWIW our readme page on GitHub (https://github.com/PerfDotNet/BenchmarkDotNet/) has had an overhaul, it now begins with a step-by-step guide. Full disclaimer, I'm one of the authors, so I'm probably biased :-) On Saturday, 20 February 2016, nietras [email protected] wrote:
|
@nietras do you have a profile screenshot. I would like to take this up with the CLR if the cost of fixed/pinned is actually showing up. |
I know! I follow your blog too, great work. Thanks for replying.
As a good digital citizen I have in fact filed an issue dotnet/BenchmarkDotNet#72 which hasn't been marked as resolved. This, therefore, also means I have not tried the latest version.
My biggest issue I guess is that BenchmarkDotNet does not come with memory profiling as default and with it enabled by default. This could be as simple as a minimum to add GC collection counts for the different generations (this is pretty easy to measure and does not require external things). Additionally, diagnosers are as far as I can tell still not available as a nuget package. There is very little info on using and setting up diagnosers, again specifically for memory benchmarking but also something I think has great value actual machine assembly code output. @sajayantony sorry I do not have any particular profiling data since I pretty much got stuck with just trying to get stuff working. And it is not really the fixing itself I am so much "worried" about, it is actually the unnecessary bounds checking and code size. Since the code has been optimized so much I think trying to minimize this minor things could help perf. And some of these things will not show up in micro-benchmarking (like code size) but only in somewhat realistic full testing scenarios. Which I wanted to try to get a baseline for first. But micro-benchmarking could be the first step. Not sure when I will have time to look at this, though. |
So I don't hijack this thread any more, I opened an issue on BenchmarkDotNet to discuss this |
It seems to me that the memory pool really should be based on unmanaged memory since it is pinned/fixed anyway when used with libuv or similar. Changing this should not only reduce the cross-cutting duplication of pinned/fixed statements but perhaps also improve perf.
Asking in a different way, why insist on using managed byte arrays when these are used in a fixed manner?
The text was updated successfully, but these errors were encountered: