-
Notifications
You must be signed in to change notification settings - Fork 12k
Vulkan for Android #5739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Probably #5186 ? |
You are on a path to waste a lot of time. I would know, because I did. This one is easy to work around, but the next one will be tough. And then, even if it is all resolved, it will be slower than a good CPU anyway. |
Do you think there is room left to minimize the memory transfers in this backend? I did see some TODO in the code suggesting something of the sort. The cost is probably too high on the mobile chipsets, with their (best case) 4x16 bit memory bus. |
No, that's done already. I think the issue is that the shaders are optimized for Nvidia/AMD, but these mobile GPUs work differently. The most obvious difference is the warp size of 16. Optimizing for that might help. |
Yes, I noticed that CLBlast has kernel tuners (per op, sweeping some parameters), and the results they produce differ quite a bit per GPU, so it is built by generating custom headers based on the tuners output. Not that it helps very much because that backend is really just an external BLAS, it does not offload the whole graph. But perhaps something like that can be implemented for the Vulkan shaders, if they have equivalent parameters to sweep? Btw, the warp size depends on the platform: |
Yeah, in the long term I'd like to write an auto tuner, especially for the matrix matrix and matrix vector multiplication shaders. But at the moment there are more important topics. If someone else wants to give it a try, I'll help as much as I can. We can't just take CLBlast tunings, since they are for different kernels. |
If you give me some list of parameters and possible ranges to sweep, I will at least try some brute force experimentation to see if it helps anything. Also, by now it is pretty clear to me that the only way this backend works in any coherent manner on Adreno is when the Vulkan buffer is smaller than the max allocation size (1GB). I suspect that short of fixing the driver (which I could ask QC to do, but would not hold my breath) the only real solution would be to "shard" the model for GPU offload. Have you given any thought to such ideas? I know that has been done before in a different context with very similar constraints. |
The first parameters to tune would be the specialization constants of the matrix matrix multiplication shader, but it's not straightforward which combinations of them are valid. They have a bunch of constraints that I haven't documented yet. The model does get split into multiple buffers if more than maxAllocationSize or maxBufferSize of the Vulkan device is required. Even on Nvidia/AMD/Intel this is necessary, as they only allow a max of 2 or 4GB buffers. |
Ok, whenever you can document anything to sweep/try, let me know. As far as the model split - I have a suspicion that is not covering all the scenarios. I did see the code that handles it in ggml_backend_alloc_ctx_tensors_from_buft(), but nowhere else besides that. And it looks like the buffer for model tensors may get allocated by ggml_backend_cpu_buffer_from_ptr() in llama.cpp:4456 because it takes that "important for Apple path". Admittedly, I don't know the code well enough to be sure I am not misinterpreting things, but it does take that path on Adreno, so it is not clear how the max allocation would be respected. Again, consider the fact that this is UMA with a small allocation limit, unlike Apple. This isn't like any other platform, so it might take a path you didn't expect. To check that hunch i tried to disable mmap, which would force it to take the ggml_backend_alloc_ctx_tensors_from_buft() path, but that does not help. It still reports a Vulkan buffer larger than 1GB, and still dies with DEVICE_LOST.
|
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Uh oh!
There was an error while loading. Please reload this page.
System:
Android 14 termux
Version:
latest
The text was updated successfully, but these errors were encountered: