-
Notifications
You must be signed in to change notification settings - Fork 11.9k
[Performance] Llava-cli offloading image encoding to cuda #6883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It works on my end as expected:
|
Any idea/pointer where I should look? not sure why image encoder tower seems to be run on CPU :/ |
@rezacopol It's unclear what parameters you're using. My guess is you didn't enable GPU, ie. |
I did that and make sure things are offloaded to gpu using nvidia-smi command, here is the full command and log I get. I also tried with a machine with multiple GPU and found running it with -sm none helps. ./llava-cli -m ../ggml_llava-v1.5-7b/ggml-model-q5_k.gguf --mmproj ../ggml_llava-v1.5-7b/mmproj-model-f16.gguf --image ../photos/kitchen_4p.jpg -p "describe the photo in detail" -ngl 100 -sm none
|
This issue was closed because it has been inactive for 14 days since being marked as stale. |
@rezacopol have you ever solved this problem? |
Running on cuda 12.2.
Looking at running llava 1.5 using llava-cli, image encoding timing is 10x worse than running on Mac m2. I saw this thread that seems it fixed the issue but numbers tell a different story. Any insight?
Running on M2:
Running on cuda:
The text was updated successfully, but these errors were encountered: