Optimize FileSerializationSink by using parking_lot::Mutex and avoiding heap allocations in write_atomic. #88

michaelwoerister · 2019-11-15T11:57:18Z

This PR makes the FileSerializationSink exactly as fast as the MmapSerializationSink in the benchmarks we have. But I only tested on Linux and I also remember that the benchmarks were no good indication of performance when used in rustc.

It would be nice if we could get rid of the MmapSerializationSink because it keeps everything in memory until the end.

I wonder how much work it would be to have benchmarks that actually run rustc. It's a hassle to test this manually.

michaelwoerister · 2019-11-15T12:59:52Z

I think our benchmarks generate too little data to be realistic (less than a megabyte, I think). I'll look into that later. Also, a multi-threaded benchmark would be good.

wesleywiser · 2019-11-15T13:25:52Z

But I only tested on Linux and I also remember that the benchmarks were no good indication of performance when used in rustc.

FYI, as I recall, the issue was that mmap was quite a bit slower on Windows.

michaelwoerister · 2019-11-15T16:07:17Z

Here are some benchmark numbers with and without the patch (numbers in milliseconds):

Windows	before	after	improvement
file, 1 thread	897	773	~15%
file, 8 threads	921	663	~30%
mmap, 1 thread	909	939	-
mmap, 8 threads	839	830	-

Linux	before	after	improvement
file, 1 thread	605	510	15-20%
file, 8 threads	765	752	-
mmap, 1 thread	512	519	-
mmap, 8 threads	378	349	-

The patch should be very good for Windows, especially for the multithreaded case. On Linux mmap is as fast (1 thread) or much faster (8 threads). I don't have any numbers for macOS.

wesleywiser · 2019-11-15T16:15:23Z

I can gather some macOS numbers if you're interested.

michaelwoerister · 2019-11-15T16:24:29Z

If you have time, I'd be interested, yeah.

I got the numbers by running cargo +nightly bench. The "after" numbers are from this PR as it is, and the "before" numbers are from this PR, but with commit 8367ec6 removed. The last commit has to stay in because it modifies the benchmarks.

michaelwoerister · 2019-11-15T16:26:18Z

Update: I just pushed the "baseline" version to https://github.com/michaelwoerister/measureme/tree/opt-file-sink-ref.

wesleywiser · 2019-11-18T03:11:48Z

Here's what I'm seeing on my 4 core 8 thread MBP:

macOS	before	after	improvement
file, 1 thread	759	556	~26%
file, 8 thread	1,666	559	~66%
mmap, 1 thread	670	622	~7%
mmap, 8 thread	394	386	~2%

michaelwoerister · 2019-11-18T08:00:11Z

Thanks, @wesleywiser! Looks like we actually might want to switch to FileSerializationSink on macOS too for the non-parallel case.

wesleywiser

Is the second commit still WIP? If not, this looks good to me overall.

measureme/Cargo.toml

measureme/src/file_serialization_sink.rs

michaelwoerister · 2019-11-19T08:27:07Z

The second commit is still work-in-progress, yes. I want to update the code in testing_common to handle multiple threads, which shouldn't be too hard. Thanks for review!

michaelwoerister · 2019-11-20T14:12:49Z

I did some new Windows measurements of parking_lot vs std::sync. It looks like parking_lot is a bit faster indeed. Let's keep it.

Windows	before	std::sync	parking_lot
file, 1 thread	744	614	589
file, 8 threads	862	637	582

…ng heap allocations in write_atomic.

…onSink.

michaelwoerister · 2019-11-20T14:52:56Z

This is ready for review now.

andjo403 · 2019-11-20T22:30:10Z

measureme/src/file_serialization_sink.rs

        let file = fs::File::create(path)?;

        Ok(FileSerializationSink {
-            data: Mutex::new((BufWriter::new(file), 0)),


what is the gain by removing the BufWriter? feels like the new code is similar to the BufWriter code
so will BufWriter::with_capacity(1024*512, file) give the same result?

The difference is that BufWriter does not allow directly writing to its buffer, so we are basically re-implementing BufWriter here. The interface of write_atomic requires there to be a writable output buffer.

andjo403 · 2019-11-20T22:46:42Z

have someone looked at how mush the result differ for the "fast path" where the buffer is updated and the "slow path" where the file is written? as we are blocking all threads from writing during the file write this can affect many events.
also by increasing the size of the buffer to 512Kb from 8Kb maybe the variance of the measurements bigger as it takes longer to write.
was thinking that maybe some of the variance see in #67 can be due to the file writes.

michaelwoerister · 2019-11-21T08:20:24Z

@andjo403 I have not investigated variance. The bigger buffer should reduce the number of writes, while making each write larger. So they are less evenly distributed but the fixed overhead might amortize better.

It would be nice to do the actual file writing in a background thread via some kind of double buffering scheme. I haven't tried to implement something like that though.

michaelwoerister mentioned this pull request Nov 15, 2019

Only use 48 bits for encoding timestamps and 32 bits for encoding thread IDs in RawEvent in order to make it smaller. #86

Merged

wesleywiser self-assigned this Nov 15, 2019

wesleywiser approved these changes Nov 19, 2019

View reviewed changes

measureme/Cargo.toml Show resolved Hide resolved

measureme/src/file_serialization_sink.rs Outdated Show resolved Hide resolved

michaelwoerister force-pushed the opt-file-sink branch from 3a6a0df to 58dbbd9 Compare November 20, 2019 14:15

michaelwoerister added 3 commits November 20, 2019 15:15

Optimize FileSerializationSink by using parking_lot::Mutex and avoidi…

f3c7f67

…ng heap allocations in write_atomic.

Improve SerializationSink benchmarks and tests.

7199065

Support writing arbitrarily-sized amounts of data via FileSerializati…

b9d0111

…onSink.

michaelwoerister force-pushed the opt-file-sink branch from 58dbbd9 to b9d0111 Compare November 20, 2019 14:43

andjo403 reviewed Nov 20, 2019

View reviewed changes

wesleywiser approved these changes Nov 22, 2019

View reviewed changes

wesleywiser merged commit 665b384 into rust-lang:master Nov 22, 2019

Optimize FileSerializationSink by using parking_lot::Mutex and avoiding heap allocations in write_atomic. #88

Optimize FileSerializationSink by using parking_lot::Mutex and avoiding heap allocations in write_atomic. #88

Uh oh!

Conversation

michaelwoerister commented Nov 15, 2019

Uh oh!

michaelwoerister commented Nov 15, 2019

Uh oh!

wesleywiser commented Nov 15, 2019

Uh oh!

michaelwoerister commented Nov 15, 2019

Uh oh!

wesleywiser commented Nov 15, 2019

Uh oh!

michaelwoerister commented Nov 15, 2019

Uh oh!

michaelwoerister commented Nov 15, 2019

Uh oh!

wesleywiser commented Nov 18, 2019

Uh oh!

michaelwoerister commented Nov 18, 2019

Uh oh!

wesleywiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

michaelwoerister commented Nov 19, 2019

Uh oh!

michaelwoerister commented Nov 20, 2019

Uh oh!

michaelwoerister commented Nov 20, 2019

Uh oh!

andjo403 Nov 20, 2019

Choose a reason for hiding this comment

Uh oh!

michaelwoerister Nov 21, 2019

Choose a reason for hiding this comment

Uh oh!

andjo403 commented Nov 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelwoerister commented Nov 21, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andjo403 commented Nov 20, 2019 •

edited

Loading