[Feature]: loading model from remote KV store such as Redis

### 🚀 The feature, motivation and pitch

Currently, loading models in `vLLM` takes such a long time, up to several minutes. 3 steps are needed:

1.  Downloading model files from `S3` or `HuggingFace` or your own repo.
2. Reading, decoding and loading `Tensor` from disk file into CPU memory.
3. Copy `Tensor` into GPU memory.

I wonder whether it is a good idea to introduce a new class named `RemoteModelLoader` in `model_loader.py`. By doing so, we just need to store model tensors and metadata into remote database only once. After that, we could load models from remote database directly, and also faster than traditional way since local disk is not involved.

Besides `Redis`, I noticed that some companies and organization are working on RDMA-based KV database, which is much faster theoretically. And step 3 is not necessary if using `GDR`. Those databases in the future may also become available by `RemoteModelLoader`, similar as how `Redis` is used.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: loading model from remote KV store such as Redis #12250

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: loading model from remote KV store such as Redis #12250

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions