You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think a recent change might have caused this. I am using llama-2-7b-chat.Q4_K_M.gguf for a local Q&A RAG pipeline, created using LlamaIndex. I developed a proof of concept on a machine using 0.2.13 version and saw this in the output:
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 2048.00 MB
When I recently installed llama-cpp-python on a new machine, I don't see this in output anymore and my process has slowed down significantly. Can you please advise? Let me know if you need anything additional.