-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed as not planned
Closed as not planned
Copy link
Labels
Description
Your current environment
def destroy(self):
import gc
import torch
import ray
import contextlib
logger.info("vllm destroy")
def cleanup():
from vllm.distributed.parallel_state import destroy_model_parallel
# from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel
os.environ["TOKENIZERS_PARALLELISM"] = "false"
destroy_model_parallel()
with contextlib.suppress(AssertionError):
torch.distributed.destroy_process_group()
gc.collect()
torch.cuda.empty_cache()
ray.shutdown()
for _ in range(10):
cleanup()
del self.model.llm_engine.model_executor.driver_worker
del self.model
gc.collect()
torch.cuda.empty_cache()
vllm=0.4.2
I tried this method, but it didn't work.
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.