Skip to content

Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

Open
@charlesrwest

Description

@charlesrwest

Name and Version

This appears to be the same bug as noted in this issue:
#7575

We are trying to do inference from multiple threads with some contexts having LORAs loaded and others not (so batched inference isn't going to work). If I may ask, has there been any progress on this issue? We are currently using a build from mid September 2024.

Operating systems

Windows

GGML backends

Vulkan

Hardware

2x Nvidia RTX 3090s.

Models

Meta Llama 3.2 3B 8 bit quant.

Problem description & steps to reproduce

When we run llama_decode with different contexts in different threads, we get a crash. The only way around this appears to be to strictly control access to llama_decode and LORA loading via a mutex.

First Bad Commit

No response

Relevant log output

It appears to be an error in vkQueueSubmit, line 1101.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions