Skip to content

Separated context and state for easier parallelization. #494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

sandrohanea
Copy link
Contributor

@sandrohanea sandrohanea commented Feb 12, 2023

#475

  • Separated and tested the state of the transformation as a different struct.
  • Fixed examples (stream, command, talk, main) => because these are not using the callbacks when parsing the results, they need to call whisper_full_with_state instead of whisper_full so, they can read the results from the state at the end of transformation.
  • Fix rest of examples
  • Fix Bindings + examples of bindings

Note:
In order for the bindings to be able to use same context with multiple different transformations (with a state for each transformation) in parallel, some adjustments are needed, but the default cases are already working.

std::vector<float> probs;
std::vector<float> logits;
std::vector<float> logprobs;
std::vector<float> probs{};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fought with some read access denied exception for almost 2h, only to find out that it's a bug in compiler (only for debug): microsoft/STL#1934

The workaround is just to initialize the struct (otherwise, it's pretty hard to debug)

@sandrohanea sandrohanea changed the title [WIP] Separated context and state for easier parallelization. Separated context and state for easier parallelization. Feb 13, 2023
@sandrohanea
Copy link
Contributor Author

sandrohanea commented Feb 13, 2023

@ggerganov please take a look when you have some time.

It will be pretty hard to keep this branch and resolve conflicts if other changes are done.

@@ -10,20 +10,20 @@

constexpr int N_THREAD = 8;

// TODO: get rid of this vector of contexts - bad idea in the first place
Copy link
Contributor Author

@sandrohanea sandrohanea Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed all these todos and kept only 1 context needed + a vector of states.

@RndyP
Copy link

RndyP commented Feb 13, 2023

Been following your work here. Not sure if this is related, but I profile Whisper under Visual C++ profiling tool, and there is this section of code in ggml.c that is using a spinlock that is taking 10% of CPU time. Shouldn't this be using a mutex?
image

@sandrohanea
Copy link
Contributor Author

Hello @RndyP ,
Indeed, it looks like that code can be optimized, but it's not in the scope of this PR. This PR is meant to split the whisper_context into whisper_context + whisper_state so whisper_context can be reused and thread safe.

About that spinlock, I agree it would be nice to change it. Can you, maybe create another issue about your findings?

@RndyP
Copy link

RndyP commented Feb 13, 2023

I reported this in issue #300, which has fallen down the list. I don't understand the code enough to suggest a fix myself.
In MSVC, atomic_load() looks like this:

static LONG atomic_load(atomic_int* ptr) {
return InterlockedCompareExchange(ptr, 0, 0);

I believe InterlockedCompareExchange() is simply locking the data values, and the while() is simply spinning and waiting for the value to change.

@ggerganov
Copy link
Member

@sandrohanea
Thanks for this nice work! I will need some time to think about this change before merging. I kind of want to keep the existing API the same if possible and add extra functions for your use case, but not sure if this is better compared to what you have proposed here.

Don't worry about keeping up-to-date - I will be able to do that if necessary.

@sandrohanea
Copy link
Contributor Author

Thanks a lot for taking the time, it totally makes sense.

Also, thanks again for creating the whole library in the first place, really titanic work to port all tensor operations and everything.

I was thinking also about keeping a "default state" in the context and use that if no different state is provided, this way it won't be a breaking change to existing functionality.

On the other hand, it is really easy to use it wrong if context is thread safe "sometimes".

It's your call on this one. If you have any idea how this could be done better, please, let me know and I'll be happy to help.

@sandrohanea
Copy link
Contributor Author

Closing this as the alternative with opt in state was merged.

@sandrohanea sandrohanea closed this Mar 6, 2023
@sandrohanea sandrohanea deleted the feature/separateCtxAndState branch March 6, 2023 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants