[Usage]: how to terminal a vllm model and free or release gpu memory

### Your current environment

```text
    def destroy(self):
        import gc
        import torch
        import ray
        import contextlib
        logger.info("vllm destroy")
        def cleanup():
            from vllm.distributed.parallel_state import destroy_model_parallel
            # from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel
            os.environ["TOKENIZERS_PARALLELISM"] = "false"
            destroy_model_parallel()
            with contextlib.suppress(AssertionError):
                torch.distributed.destroy_process_group()
            gc.collect()
            torch.cuda.empty_cache()
            ray.shutdown()
        for _ in range(10):
            cleanup()
        del self.model.llm_engine.model_executor.driver_worker
        del self.model
        gc.collect()
        torch.cuda.empty_cache()
```
vllm=0.4.2
I tried this method, but it didn't work.

### How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: how to terminal a vllm model and free or release gpu memory #5211

Your current environment

How would you like to use vllm

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: how to terminal a vllm model and free or release gpu memory #5211

Description

Your current environment

How would you like to use vllm

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions