Skip to content

Conversation

@Myasuka
Copy link
Contributor

@Myasuka Myasuka commented Oct 23, 2020

RocksDB introduce performance regression via replaced __thread with thread_local keyword.
This PR reverts PerfContext back to __thread and abort some usage as it would not be used Flink.

@Myasuka Myasuka marked this pull request as ready for review October 26, 2020 02:45
@carp84
Copy link

carp84 commented Oct 26, 2020

@Myasuka could you attach the performance data before / after this change / fix? Thanks

@Myasuka
Copy link
Contributor Author

Myasuka commented Oct 26, 2020

Below is the experimental result after apply this fix on a linux machine with 500GB memory in total.
As we can see, almost all cases behave better than original 6.11

Benchmark original RocksDB-6.11 throughput Fixed RocksDB-6.11 throughput
org.apache.flink.state.benchmark.ListStateBenchmark.listAdd 489.340911 ± 5.614968 ops/ms 507.793520 ± 7.887317 ops/ms
org.apache.flink.state.benchmark.ListStateBenchmark.listAddAll 289.259124 ± 4.757700 ops/ms 299.358497 ± 10.935809 ops/ms
org.apache.flink.state.benchmark.ListStateBenchmark.listAppend 460.467266 ± 6.567233 ops/ms 483.354214 ± 6.141290 ops/ms
org.apache.flink.state.benchmark.ListStateBenchmark.listGet 122.195119 ± 0.583669 ops/ms 138.067541 ± 0.519469 ops/ms
org.apache.flink.state.benchmark.ListStateBenchmark.listGetAndIterate 122.376738 ± 0.705641 ops/ms 137.312238 ± 1.323462 ops/ms
org.apache.flink.state.benchmark.ListStateBenchmark.listUpdate 495.459603 ± 4.561395 ops/ms 509.520106 ± 5.077400 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapAdd 431.088394 ± 20.468059 ops/ms 446.252946 ± 22.396690 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapContains 48.730236 ± 0.457412 ops/ms 54.539764 ± 0.118094 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapEntries 329.641797 ± 7.774533 ops/ms 347.769279 ± 8.160462 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapGet 49.164114 ± 0.177340 ops/ms 53.877028 ± 0.265754 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapIsEmpty 42.816571 ± 0.139690 ops/ms 46.796282 ± 0.214785 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapIterator 331.323677 ± 7.935423 ops/ms 351.181264 ± 8.571055 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapKeys 321.080059 ± 12.947762 ops/ms 349.881785 ± 8.268457 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapPutAll 100.037343 ± 6.779264 ops/ms 94.216012 ± 7.537000 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapRemove 426.177218 ± 29.927329 ops/ms 445.132551 ± 28.183995 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapUpdate 428.597067 ± 21.329073 ops/ms 446.231137 ± 18.235515 ops/ms
org.apache.flink.state.benchmark.MapStateBenchmark.mapValues 330.840970 ± 7.683108 ops/ms 349.290260 ± 6.695283 ops/ms
org.apache.flink.state.benchmark.ValueStateBenchmark.valueAdd 426.132608 ± 22.066362 ops/ms 436.521025 ± 31.498314 ops/ms
org.apache.flink.state.benchmark.ValueStateBenchmark.valueGet 620.187399 ± 44.498021 ops/ms 670.520548 ± 30.156394 ops/ms
org.apache.flink.state.benchmark.ValueStateBenchmark.valueUpdate 427.154900 ± 27.734762 ops/ms 434.678702 ± 31.288398 ops/ms

@carp84
Copy link

carp84 commented Oct 27, 2020

Thanks for the follow up @Myasuka , from the numbers we could see a stable performance gain for almost all cases of our state micro benchmark.

@Myasuka
Copy link
Contributor Author

Myasuka commented Oct 27, 2020

The key performance regression is introduced by usage of thread-local, and this PR mainly revert the usage of thread-local back to __thread and remove related usage.
Apart from existing tests, I also run it with my program to check the statistics like block cache hit/miss count could still get as expected with this PR.

@Myasuka Myasuka force-pushed the revert-thread-local branch from 99cee73 to 20e9e08 Compare October 27, 2020 08:06
@Myasuka Myasuka changed the title [FLINK-19710] Revert PerfContext back to __thread [FLINK-19710] Revert implementation of PerfContext back to __thread to avoid performance regression Oct 27, 2020
@azagrebin
Copy link
Contributor

azagrebin commented Oct 28, 2020

Thanks for opening the PR @Myasuka
As I understand the main problem is the replacement of __thread with thread_local. The code reversion looks good for this.

Could you write more details about why we need the other code removal?
If it is not going to be used in Flink why does it hurt? or is it also somehow related to the performance regression?

@Myasuka
Copy link
Contributor Author

Myasuka commented Oct 29, 2020

@azagrebin
In a nutshell, we need to revert thread-local back to __thread. However, initializer for thread-local variable must be a constant expression (refer to gcc-tls), that's why I had to drop all those constructors and destructor. That's why I need to drop tests add in facebook/rocksdb#4919 and add default initializer for those values in the struct of PerfContext

I also leave the PerfContextByLevel there to not drop them.

Copy link
Contributor

@azagrebin azagrebin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @Myasuka LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants