Skip to content

[Feature request] WASM WebGPU #103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mark-beeby opened this issue Oct 27, 2022 · 7 comments
Open

[Feature request] WASM WebGPU #103

mark-beeby opened this issue Oct 27, 2022 · 7 comments
Labels
question Further information is requested

Comments

@mark-beeby
Copy link

It's clear that leveraging a GPU makes processing faster, and I believe in principle WebGPU is available in SIMD. Is it even feasible to integrate with the GPU where available in Chrome etc?

@mark-beeby mark-beeby changed the title [Feature request] WebGPU [Feature request] WASM WebGPU Oct 27, 2022
@ggerganov ggerganov added the question Further information is requested label Oct 27, 2022
@ggerganov
Copy link
Member

I'm not familiar with the WebGPU API.
If you demonstrate a basic matrix multiplication example using WebGPU, and it does not look too complicated, I might give it a try.

@niklaskorz
Copy link

I have some experience with WebGPU and might have a look at this. Note that WebGPU would allow GPU-based computation without depending on any vendor specific libraries like CUDA not only for the web but also natively (with Vulkan, DX12 or Metal), by using dawn or wgpu.

@gut4
Copy link

gut4 commented Dec 24, 2022

This can be helpful https://github.com/juj/wasm_webgpu

@sandorkonya
Copy link

@niklaskorz any chance that you would look at this? That would give even a further kick to this project, (or did I miss anything relevant and it's been solved?)

@patrickinminneapolis
Copy link

patrickinminneapolis commented Mar 20, 2023

I started looking into it -- its very easy to link wasm_webgpu into emscripten, then in principle you should be implement the matrix multiplication example from https://github.com/milhidaka/webgpu-blas -- I have done this -- but I am running to an issue with my shader. I am really curious if WebGPU will give us real-time streaming performance.

@ggerganov
Copy link
Member

ggerganov commented Mar 20, 2023

On a similar topic, recently I found this project: https://github.com/xenova/transformers.js

It has a very efficient inference of Whisper tiny using WASM. They seem to be using something called ONNX Runtime. Although adapting to such a framework is out of scope for whisper.cpp, it seems like there is still a lot to gain in the existing WASM implementation. Even without using WASM SIMD, it seems to be possible to achieve much higher performance.

I wonder if there is something that could be done in ggml to speed up the WASM processing. Even if we don't reach ONNX Runtime performance level, it would still be very nice to improve the existing speed.

Regarding WebGPU: would be great if someone provides a PoC. Transformers.js announced they will support WebGPU soon too, so it should be possible.

Edit: Btw, is there something like WASM BLAS ?

@erkkimon
Copy link

Now TransformersJS seems to have some kind of WebGPU implementation available. For those interested, check out this branch: xenova/whisper-web@main...experimental-webgpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants