Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan

### Name and Version

This appears to be the same bug as noted in this issue:
https://github.com/ggerganov/llama.cpp/issues/7575

We are trying to do inference from multiple threads with some contexts having LORAs loaded and others not (so batched inference isn't going to work).  If I may ask, has there been any progress on this issue?  We are currently using a build from mid September 2024.

### Operating systems

Windows

### GGML backends

Vulkan

### Hardware

2x Nvidia RTX 3090s.

### Models

Meta Llama 3.2 3B 8 bit quant.

### Problem description & steps to reproduce

When we run llama_decode with different contexts in different threads, we get a crash.  The only way around this appears to be to strictly control access to llama_decode and LORA loading via a mutex.

### First Bad Commit

_No response_

### Relevant log output

```shell
It appears to be an error in vkQueueSubmit, line 1101.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions