You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the Vulkan backend on the llama-3-8B platform and nearly saturating the VRAM ( 7.8/7.98 with a 16k context ), the generated output becomes gibberish, often consisting of repeated letters. This issue is consistently reproducible only on llama-3-8B and specifically when VRAM is nearly full with an extended context.
Using codeqwen for example doesn't result in gibberish in the output even with the vram pushed at its limits.
After setting a context too big for the it to fit on the vram it just doesn't get offloaded and the issue doesn't happen ( 24k context doesn't produce the gibberish).
I'm not sure this bug is related to #6874 because the generation in my case breaks from the beginning.