Support broadcast addition for fp32 #359

li-plus · 2023-07-09T06:05:32Z

This PR supported broadcast addition for two fp32 tensors as long as they are broadcast-able in row dimension. No longer requires same shape. Implemented on both CPU and CUDA.

slaren · 2023-07-09T16:57:33Z

For CUDA, it may be better to do this with a single kernel launch, similar to the way broadcasting is implemented for mul. @JohannesGaessler what do you think?

JohannesGaessler · 2023-07-09T17:20:10Z

Each CUDA kernel launch adds overhead. It is therefore preferable to launch as few kernels as reasonably possible. So I agree with slaren that an implementation that does the broadcasting with a single kernel launch would be preferable.

li-plus · 2023-07-10T03:56:41Z

Thanks for your advice! You are correct. Now I use a single kernel to compute broadcast add. This leads to huge performance improvement in linear layers.

ggerganov

Nice change!

Would be great if we utilize that ggml_add() is broadcast-able and simplify the inference of some of the example models. For example, no longer need to use ggml_repeat():

https://github.com/ggerganov/whisper.cpp/blob/4774d2feb01a772a15de81ffc34b34a1f294f020/examples/talk/gpt-2.cpp#L469-L471

https://github.com/ggerganov/whisper.cpp/blob/4774d2feb01a772a15de81ffc34b34a1f294f020/examples/talk/gpt-2.cpp#L449-L453

Also, need to handle the Metal and OpenCL implementations - add GGML_ASSERT(false && "not implemented") for now

Support broadcast add for fp32

cfd3c60

Use single kernel for broadcast add

b0f4c45

li-plus mentioned this pull request Jul 11, 2023

Use a single kernel for cuda mul op #373

Merged

ggerganov approved these changes Jul 11, 2023

View reviewed changes

ggerganov merged commit 649a922 into ggml-org:master Jul 11, 2023

li-plus deleted the bcast-add branch July 12, 2023 03:53

JohannesGaessler mentioned this pull request Jul 12, 2023

Hotfix for the prompt being ignored with CUDA ggml-org/llama.cpp#2190

Closed

ggerganov mentioned this pull request Jul 12, 2023

ggml : revert CUDA broadcast changes from #2183 ggml-org/llama.cpp#2191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support broadcast addition for fp32 #359

Support broadcast addition for fp32 #359

Uh oh!

li-plus commented Jul 9, 2023

Uh oh!

slaren commented Jul 9, 2023

Uh oh!

JohannesGaessler commented Jul 9, 2023

Uh oh!

li-plus commented Jul 10, 2023

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Support broadcast addition for fp32 #359

Support broadcast addition for fp32 #359

Uh oh!

Conversation

li-plus commented Jul 9, 2023

Uh oh!

slaren commented Jul 9, 2023

Uh oh!

JohannesGaessler commented Jul 9, 2023

Uh oh!

li-plus commented Jul 10, 2023

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!