Vulkan outputs gibberish using extended context with vram saturated

When using the Vulkan backend on the llama-3-8B platform and nearly saturating the VRAM ( 7.8/7.98 with a 16k context ), the generated output becomes gibberish, often consisting of repeated letters. This issue is consistently reproducible only on llama-3-8B and specifically when VRAM is nearly full with an extended context.
Using codeqwen for example doesn't result in gibberish in the output even with the vram pushed at its limits.
After setting a context too big for the it to fit on the vram it just doesn't get offloaded and the issue doesn't happen ( 24k context doesn't produce the gibberish).
I'm not sure this bug is related to #6874 because the generation in my case breaks from the beginning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vulkan outputs gibberish using extended context with vram saturated #7240

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vulkan outputs gibberish using extended context with vram saturated #7240

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions