-
Notifications
You must be signed in to change notification settings - Fork 7.6k
1.x: optimize merge/flatMap for empty sources #3761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just a general question about perf testing... in the development of SyncOnSubscribe we wrote a perf test that used the |
My perfs measure the overhead of the infrastructure where the subscriber does nothing else. This is like an upper bound for the throughput you can achieve. Clearly, if you have |
I hear what you are saying, however sleep is very different than consuming cpu cycles. I completely agree that testing the lower bounds of performance is valuable. Right now we are testing very common use cases. However another common use case is where other work is done on business logic. Using the |
Also there is the matter of the JIT-er. I am not entirely sure but wouldn't this prevent inlining the Func1? This surely is a common use case that we are missing in these perf tests. |
Our infrastructure is full of atomic operators that take 21-45 cycles on a good day and cause write buffer flushes even with synchronous code. I think the Primarily, call depth/stack depth is the limiting factory for JIT, the fewer layers there are and the smaller the methods are, JIT can do more. This is why I advocate for flatMap() instead of merge() because merge(map()) allocates more and pushes through more layers than flatMap() which has the function call and result use right next to each other. JIT inlines such Func1 quite nicely and with such barebone perfs, failures of inline also show up as a throughput loss. However, just by looking at the code, only JIT experts can tell what happens. There is the JITWatch tool that does a better job but requires some nasty DLLs to be built for Windows and thus I don't use it. |
I think that what @stealthcode is referring to is the fact that most microbenchmarks test a tiny piece of code in a contented way. AFAIK Regarding the JIT, as you mentioned call depth is a limiting factor, but AFAIK the main one is the byte-code size of the method. Thus, a big method is less likely to be inlined, and then it's less likely that beneficial optimizations will take place (dead-code elimination, escape-analysis, ...). That being said, the modification you proposed is relatively minimal (1 test, 1 method call), and the impact on the byte-code size is small. So 👍 for this change. PS: JITWatch is a very good tool, especially when you want to learn what the JVM is doing. |
👍 // comparison looks fantastic |
Just to be clear, my previous comment was a 👍 |
This PR improves the overhead when one merges/flatMaps
empty()
sequences.Benchmark results: (i7 4770K, Windows 7 x64, Java 8u72):
For rare
empty()
, the overhead seems to be around the noise level.