Skip to content

[Merge] optimize tasks creation and remove sendable constraint #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

twittemb
Copy link
Contributor

@twittemb twittemb commented Sep 6, 2022

Hi

To follow this discussion #192, here is an attempt to reach several goals:

  • minimise the task creation (1 per upstream + 1 main for the group)
  • have conditional Sendable conformance for AsyncMergeSequences
  • make a MergeStateMachine that can be reused with any number of sequences (preparing the field for variadic params)
  • be able to use the same MergeStateMachine in AsyncChunksOfCountOrSignalSequence
  • improve the performances
  • reuse the existing locking mechanism

Some hints about the conception:

  • I've introduced the concept of Regulator. This is a wrapper around an AsyncSequence that can iterates over it and suspend before calling next until it is resumed by an external entity. I think this is something we can leverage also in AsyncCombineLatestSequence where we will also want to avoid creating too much tasks. The Regulator has its own state, not shared with any one else, there is no synchronisation point with the outside.
  • The MergeStateMachine is no more specialised with several AsyncSequence generic types but only by the Element type, which opens the door to variadic params. The MergeStateMachine has its own state also, independent from the upstreams.

Performance wise, this is what I have measured on my M1 Ultra:

Currently:

  • Merge2: 150 000 events/s
  • Merge3: 115 000 events/s

My PR:

  • Merge2: 320 000 events/s
  • Merge3: 260 000 events/s

I've added a Merge4 operation in the throughput test just to show that it is now possible to merge any number of AsyncSequence.

There is a significant improvement. @FranzBusch might have even better results with his implementation and perhaps we can improve mine.

@twittemb twittemb marked this pull request as draft September 7, 2022 21:05
@FranzBusch
Copy link
Member

Hey @twittemb,

I just had a brief look over your last push and your are getting closer and closer to what I have done in the other PR. However I spotted a couple of things that are different and won't work like stuffing the bases into an Array. In the end, we already have what we want in the other PR and it just requires the last final touches.

I would love to understand what you want to achieve that the other PR doesn't achieve?

@twittemb twittemb marked this pull request as ready for review September 8, 2022 16:40
@twittemb
Copy link
Contributor Author

twittemb commented Sep 8, 2022

Hey @twittemb,

I just had a brief look over your last push and your are getting closer and closer to what I have done in the other PR. However I spotted a couple of things that are different and won't work like stuffing the bases into an Array. In the end, we already have what we want in the other PR and it just requires the last final touches.

I would love to understand what you want to achieve that the other PR doesn't achieve?

Hi @FranzBusch

I've updated the description to explain my intentions. My intent was to have some components that are reusable (like I said, AsyncCombineLatestSequence will have to be refactored as well IMO) and performant and that are ready for variadic params without having to duplicate the code every time.

Could you explain the "array" thing please ? (by the way the implementation of the variadic params here is just a "demo" to show the intent)

@FranzBusch
Copy link
Member

Hey @twittemb,
I just had a brief look over your last push and your are getting closer and closer to what I have done in the other PR. However I spotted a couple of things that are different and won't work like stuffing the bases into an Array. In the end, we already have what we want in the other PR and it just requires the last final touches.
I would love to understand what you want to achieve that the other PR doesn't achieve?

Hi @FranzBusch

I've updated the description to explain my intentions. My intent was to have some components that are reusable (like I said, AsyncCombineLatestSequence will have to be refactored as well IMO) and performant and that are ready for variadic params without having to duplicate the code every time.

Could you explain the "array" thing please ? (by the way the implementation of the variadic params here is just a "demo" to show the intent)

While I fully understand that you wanna abstract things away (this is something we programmers always want to do quite quickly) I personally don't believe this is the right approach for this package with the exception of some algorithms that can be trivially implemented on top of others.

The reason why I think this is a no-goal is that abstractions often incur some kind of cost. Either performance, size or usability. Furthermore, coming up with arbitrary abstractions makes it harder to understand what the implementation does and reason if it does the right thing. In the end, one should be able to look at the implementation of the code and go trough the various states that an algorithm can have and understand how it reacts. In your implementation, I find it personally very hard to follow what goes on since there are types in the mix that are not very self-explaining and there is no comments along the way.

On the last note why using an Array is bad. The array results in a heap allocation which makes this algorithms worse performance wise. My implementation is also ready for variadic generics we just need to replace the abstract state machine's generic types with a variadic generic.

@twittemb
Copy link
Contributor Author

twittemb commented Sep 9, 2022

Hi @FranzBusch

Thanks for you detailed answer. When I introduced the Regulator, my intent was not the randomly abstract things but to provide a tool that could be reused in several situations (trying to not fall in the pit of early optimisation). I can imagine that a sequence that can be suspended/resumed at will can be a useful tool (merge and combineLatest are 2 good examples IMO). In the test package we also have a GatedSequence as a test tool and perhaps we could merge those implementations.

As an OpenSource package I think it is also important for contributors to suggest new things based on the tools we could provide (the existing locking mechanism is a nice example), but that's just my opinion and might not be relevant (also it is my first time contributing to an Apple repo) :-)

I will cancel this PR and the associated Issue to remove the noise. My last concern was about AsyncChunksOfCountOrSignalSequence and the fact that it still uses the old Merge2StateMachine from what I understand. Is this something you plan to address ? I feel it would be bad to maintain 2 different implementations, and the old one is not as performant as yours.

@phausler one last though about this ?

@FranzBusch
Copy link
Member

Hi @FranzBusch

Thanks for you detailed answer. When I introduced the Regulator, my intent was not the randomly abstract things but to provide a tool that could be reused in several situations (trying to not fall in the pit of early optimisation). I can imagine that a sequence that can be suspended/resumed at will can be a useful tool (merge and combineLatest are 2 good examples IMO). In the test package we also have a GatedSequence as a test tool and perhaps we could merged those implementations.

As an OpenSource package I think it is also important for contributors to suggest new things based on the tools we could provide (the existing locking mechanism is a nice example), but that's just my opinion and might not be relevant (also it is my first time contributing to an Apple repo) :-)

Don't get me wrong, we are super happy with any contributions we get and you already have provided some very nice improvements throughout this package. Please keep them coming, we really appreciate them!

I will cancel this PR and the associated Issue to remove the noise. My last concern was about AsyncChunksOfCountOrSignalSequence and the fact that it still uses the old Merge2StateMachine from what I understand. Is this something you plan to address ? I feel it would be bad to maintain 2 different implementations, and the old one is not as performant as yours.

You are right there is still more to migrate but I wanted to keep the PR focused and chunked hasn't been pitched yet. Though as you said we really need to reimplement this!

There are also more algorithms in here that currently constrain the Sendable requirement of the AsyncIterator which we should revisit and understand if they are really required.

@twittemb
Copy link
Contributor Author

twittemb commented Sep 9, 2022

Thanks for your kind words. Always happy to help, and will continue to try to. looking forward to you next implementations.
Have a nice one.

@twittemb twittemb closed this Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants