Skip to content

Support broadcast addition for fp32 #359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 11, 2023
Merged

Conversation

li-plus
Copy link
Contributor

@li-plus li-plus commented Jul 9, 2023

This PR supported broadcast addition for two fp32 tensors as long as they are broadcast-able in row dimension. No longer requires same shape. Implemented on both CPU and CUDA.

@slaren
Copy link
Member

slaren commented Jul 9, 2023

For CUDA, it may be better to do this with a single kernel launch, similar to the way broadcasting is implemented for mul. @JohannesGaessler what do you think?

@JohannesGaessler
Copy link
Collaborator

Each CUDA kernel launch adds overhead. It is therefore preferable to launch as few kernels as reasonably possible. So I agree with slaren that an implementation that does the broadcasting with a single kernel launch would be preferable.

@li-plus
Copy link
Contributor Author

li-plus commented Jul 10, 2023

Thanks for your advice! You are correct. Now I use a single kernel to compute broadcast add. This leads to huge performance improvement in linear layers.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change!

Would be great if we utilize that ggml_add() is broadcast-able and simplify the inference of some of the example models. For example, no longer need to use ggml_repeat():

https://github.com/ggerganov/whisper.cpp/blob/4774d2feb01a772a15de81ffc34b34a1f294f020/examples/talk/gpt-2.cpp#L469-L471

https://github.com/ggerganov/whisper.cpp/blob/4774d2feb01a772a15de81ffc34b34a1f294f020/examples/talk/gpt-2.cpp#L449-L453

Also, need to handle the Metal and OpenCL implementations - add GGML_ASSERT(false && "not implemented") for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants