Skip to content

Vulkan outputs gibberish using extended context with vram saturated #7240

@daniandtheweb

Description

@daniandtheweb

When using the Vulkan backend on the llama-3-8B platform and nearly saturating the VRAM ( 7.8/7.98 with a 16k context ), the generated output becomes gibberish, often consisting of repeated letters. This issue is consistently reproducible only on llama-3-8B and specifically when VRAM is nearly full with an extended context.
Using codeqwen for example doesn't result in gibberish in the output even with the vram pushed at its limits.
After setting a context too big for the it to fit on the vram it just doesn't get offloaded and the issue doesn't happen ( 24k context doesn't produce the gibberish).
I'm not sure this bug is related to #6874 because the generation in my case breaks from the beginning.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions