[Bug]: CUDA error: an illegal instruction was encountered

### Your current environment

1、Docker image:  vllm/vllm-openai:v0.8.5
2、Model:  Qwen/Qwen3-235B-A22B-FP8
3、GPU：H20
4、Launch Parameters：
`vllm serve /workspace/Qwen3-235B-A22B-FP8 
--served-model-name=qwen3-235b 
--kv-cache-dtype fp8 
--allow-credentials 
--trust-remote-code 
--gpu-memory-utilization=0.9 
--port=30000 
--host=0.0.0.0 
--max-model-len=65536 
--max-num-seqs=64 
--tensor-parallel-size=4 
--enable-expert-parallel 
--enable-reasoning 
--reasoning-parser deepseek_r1 
--enable-auto-tool-choice 
--tool-call-parser hermes`



### 🐛 Describe the bug

error logs:
`INFO 05-12 19:01:18 [async_llm.py:252] Added request chatcmpl-3dd5c7cd8c134f3fa6e8fc27500cc840.
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] WorkerProc hit an exception.
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] Traceback (most recent call last):
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     output = func(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return func(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 268, in execute_model
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     output = self.model_runner.execute_model(scheduler_output)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return func(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1119, in execute_model
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     sampler_output = self.sampler(
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]                      ^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     sampled = self.sample(logits, sampling_metadata)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 115, in sample
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     random_sampled = self.topk_topp_sampler(
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]                      ^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 109, in forward_cuda
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return flashinfer_sample(probs, k, p, generators)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 308, in flashinfer_sample
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     if not success.all():
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] RuntimeError: CUDA error: an illegal instruction was encountered
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] 
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] Traceback (most recent call last):
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     output = func(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return func(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 268, in execute_model
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     output = self.model_runner.execute_model(scheduler_output)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return func(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1119, in execute_model
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     sampler_output = self.sampler(
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]                      ^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     sampled = self.sample(logits, sampling_metadata)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 115, in sample
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     random_sampled = self.topk_topp_sampler(
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]                      ^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 109, in forward_cuda
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return flashinfer_sample(probs, k, p, generators)
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 308, in flashinfer_sample
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     if not success.all():
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] RuntimeError: CUDA error: an illegal instruction was encountered
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(VllmWorker rank=3 pid=268) ERROR 05-12 19:01:18 [multiproc_executor.py:470]  
(VllmWorker rank=2 pid=267) ERROR 05-12 19:01:18 [multiproc_executor.py:470] 
[rank1]:[E512 19:01:18.105808661 ProcessGroupNCCL.cpp:1895] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fbe93d6c1b6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fbe93d15a76 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fbe941a3918 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fbe42415556 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fbe424228c0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x617 (0x7fbe42424557 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fbe424256ed in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7fbe9453a5c0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7fbe94d8fac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7fbe94e20a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fbe93d6c1b6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fbe93d15a76 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fbe941a3918 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fbe42415556 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fbe424228c0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x617 (0x7fbe42424557 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fbe424256ed in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7fbe9453a5c0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7fbe94d8fac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7fbe94e20a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1901 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fbe93d6c1b6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe5c6fc (0x7fbe420806fc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x7fbe9453a5c0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x7fbe94d8fac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #4: clone + 0x44 (0x7fbe94e20a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] WorkerProc hit an exception.
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] Traceback (most recent call last):
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 268, in execute_model
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     output = self.model_runner.execute_model(scheduler_output)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1119, in execute_model
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     sampler_output = self.sampler(
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]                      ^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     sampled = self.sample(logits, sampling_metadata)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 115, in sample
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     random_sampled = self.topk_topp_sampler(
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]                      ^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 109, in forward_cuda
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return flashinfer_sample(probs, k, p, generators)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 308, in flashinfer_sample
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     if not success.all():
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] RuntimeError: CUDA error: an illegal instruction was encountered
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] 
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] Traceback (most recent call last):
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 268, in execute_model
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     output = self.model_runner.execute_model(scheduler_output)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1119, in execute_model
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     sampler_output = self.sampler(
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]                      ^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 49, in forward
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     sampled = self.sample(logits, sampling_metadata)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 115, in sample
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     random_sampled = self.topk_topp_sampler(
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]                      ^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 109, in forward_cuda
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     return flashinfer_sample(probs, k, p, generators)
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 308, in flashinfer_sample
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]     if not success.all():
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470]            ^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] RuntimeError: CUDA error: an illegal instruction was encountered
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] 
(VllmWorker rank=0 pid=265) ERROR 05-12 19:01:18 [multiproc_executor.py:470] 
ERROR 05-12 19:01:18 [core.py:398] EngineCore encountered a fatal error.
ERROR 05-12 19:01:18 [core.py:398] Traceback (most recent call last):
ERROR 05-12 19:01:18 [core.py:398]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 389, in run_engine_core
ERROR 05-12 19:01:18 [core.py:398]     engine_core.run_busy_loop()
ERROR 05-12 19:01:18 [core.py:398]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 413, in run_busy_loop
ERROR 05-12 19:01:18 [core.py:398]     self._process_engine_step()
ERROR 05-12 19:01:18 [core.py:398]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 438, in _process_engine_step
ERROR 05-12 19:01:18 [core.py:398]     outputs = self.step_fn()
ERROR 05-12 19:01:18 [core.py:398]               ^^^^^^^^^^^^^^
ERROR 05-12 19:01:18 [core.py:398]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in step
ERROR 05-12 19:01:18 [core.py:398]     output = self.model_executor.execute_model(scheduler_output)
ERROR 05-12 19:01:18 [core.py:398]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-12 19:01:18 [core.py:398]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 146, in execute_model
ERROR 05-12 19:01:18 [core.py:398]     (output, ) = self.collective_rpc("execute_model",
ERROR 05-12 19:01:18 [core.py:398]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-12 19:01:18 [core.py:398]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 185, in collective_rpc
ERROR 05-12 19:01:18 [core.py:398]     raise RuntimeError(
ERROR 05-12 19:01:18 [core.py:398] RuntimeError: Worker failed with error 'CUDA error: an illegal instruction was encountered
ERROR 05-12 19:01:18 [core.py:398] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 05-12 19:01:18 [core.py:398] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 05-12 19:01:18 [core.py:398] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 05-12 19:01:18 [core.py:398] ', please check the stack trace above for the root cause
ERROR 05-12 19:01:18 [async_llm.py:399] AsyncLLM output_handler failed.
ERROR 05-12 19:01:18 [async_llm.py:399] Traceback (most recent call last):
ERROR 05-12 19:01:18 [async_llm.py:399]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 357, in output_handler
ERROR 05-12 19:01:18 [async_llm.py:399]     outputs = await engine_core.get_output_async()
ERROR 05-12 19:01:18 [async_llm.py:399]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-12 19:01:18 [async_llm.py:399]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 716, in get_output_async
ERROR 05-12 19:01:18 [async_llm.py:399]     raise self._format_exception(outputs) from None
ERROR 05-12 19:01:18 [async_llm.py:399] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 05-12 19:01:18 [async_llm.py:324] Request chatcmpl-e8dc947097064d1ebc3b6062f568a6f6 failed (engine dead).
ERROR 05-12 19:01:18 [serving_chat.py:885] Error in chat completion stream generator.
ERROR 05-12 19:01:18 [serving_chat.py:885] Traceback (most recent call last):
ERROR 05-12 19:01:18 [serving_chat.py:885]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 487, in chat_completion_stream_generator
ERROR 05-12 19:01:18 [serving_chat.py:885]     async for res in result_generator:
ERROR 05-12 19:01:18 [serving_chat.py:885]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 306, in generate
ERROR 05-12 19:01:18 [serving_chat.py:885]     out = q.get_nowait() or await q.get()
ERROR 05-12 19:01:18 [serving_chat.py:885]                             ^^^^^^^^^^^^^
ERROR 05-12 19:01:18 [serving_chat.py:885]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
ERROR 05-12 19:01:18 [serving_chat.py:885]     raise output
ERROR 05-12 19:01:18 [serving_chat.py:885]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 357, in output_handler
ERROR 05-12 19:01:18 [serving_chat.py:885]     outputs = await engine_core.get_output_async()
ERROR 05-12 19:01:18 [serving_chat.py:885]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-12 19:01:18 [serving_chat.py:885]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 716, in get_output_async
ERROR 05-12 19:01:18 [serving_chat.py:885]     raise self._format_exception(outputs) from None
ERROR 05-12 19:01:18 [serving_chat.py:885] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 05-12 19:01:18 [async_llm.py:324] Request chatcmpl-3dd5c7cd8c134f3fa6e8fc27500cc840 failed (engine dead).
(VllmWorker rank=2 pid=267) Process VllmWorker-2:
(VllmWorker rank=0 pid=265) Process VllmWorker-0:
INFO:     172.21.0.136:47946 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(VllmWorker rank=3 pid=268) Process VllmWorker-3:
(VllmWorker rank=3 pid=268) Traceback (most recent call last):
(VllmWorker rank=2 pid=267) Traceback (most recent call last):
(VllmWorker rank=0 pid=265) Traceback (most recent call last):
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 426, in worker_main
(VllmWorker rank=3 pid=268)     worker.worker_busy_loop()
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 458, in worker_busy_loop
(VllmWorker rank=3 pid=268)     method, args, kwargs, rank0_only = self.rpc_broadcast_mq.dequeue()
(VllmWorker rank=3 pid=268)                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 426, in worker_main
(VllmWorker rank=2 pid=267)     worker.worker_busy_loop()
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 479, in dequeue
(VllmWorker rank=3 pid=268)     with self.acquire_read(timeout, cancel) as buf:
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 458, in worker_busy_loop
(VllmWorker rank=3 pid=268)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=267)     method, args, kwargs, rank0_only = self.rpc_broadcast_mq.dequeue()
(VllmWorker rank=3 pid=268)   File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
(VllmWorker rank=2 pid=267)                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268)     return next(self.gen)
(VllmWorker rank=3 pid=268)            ^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 479, in dequeue
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 426, in worker_main
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 412, in acquire_read
(VllmWorker rank=2 pid=267)     with self.acquire_read(timeout, cancel) as buf:
(VllmWorker rank=0 pid=265)     worker.worker_busy_loop()
(VllmWorker rank=3 pid=268)     with self.buffer.get_metadata(self.current_idx) as metadata_buffer:
(VllmWorker rank=2 pid=267)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 458, in worker_busy_loop
(VllmWorker rank=2 pid=267)   File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
(VllmWorker rank=0 pid=265)     method, args, kwargs, rank0_only = self.rpc_broadcast_mq.dequeue()
(VllmWorker rank=2 pid=267)     return next(self.gen)
(VllmWorker rank=3 pid=268)   File "/usr/lib/python3.12/contextlib.py", line 301, in helper
(VllmWorker rank=0 pid=265)                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=267)            ^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268)     return _GeneratorContextManager(func, args, kwds)
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 479, in dequeue
(VllmWorker rank=3 pid=268)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 412, in acquire_read
(VllmWorker rank=0 pid=265)     with self.acquire_read(timeout, cancel) as buf:
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 398, in signal_handler
(VllmWorker rank=2 pid=267)     with self.buffer.get_metadata(self.current_idx) as metadata_buffer:
(VllmWorker rank=0 pid=265)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268)     raise SystemExit()
(VllmWorker rank=2 pid=267)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=265)   File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 398, in signal_handler
(VllmWorker rank=0 pid=265)     return next(self.gen)
(VllmWorker rank=2 pid=267)     raise SystemExit()
(VllmWorker rank=0 pid=265)            ^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=268) SystemExit
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 425, in acquire_read
(VllmWorker rank=0 pid=265)     sched_yield()
(VllmWorker rank=2 pid=267) SystemExit
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 41, in sched_yield
(VllmWorker rank=2 pid=267) 
(VllmWorker rank=3 pid=268) 
(VllmWorker rank=0 pid=265)     os.sched_yield()
(VllmWorker rank=2 pid=267) During handling of the above exception, another exception occurred:
(VllmWorker rank=3 pid=268) During handling of the above exception, another exception occurred:
(VllmWorker rank=2 pid=267) 
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 398, in signal_handler
(VllmWorker rank=3 pid=268) 
(VllmWorker rank=0 pid=265)     raise SystemExit()
(VllmWorker rank=2 pid=267) Traceback (most recent call last):
(VllmWorker rank=3 pid=268) Traceback (most recent call last):
(VllmWorker rank=0 pid=265) SystemExit
(VllmWorker rank=0 pid=265) 
(VllmWorker rank=0 pid=265) During handling of the above exception, another exception occurred:
(VllmWorker rank=0 pid=265) 
(VllmWorker rank=0 pid=265) Traceback (most recent call last):
(VllmWorker rank=3 pid=268)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorker rank=2 pid=267)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorker rank=3 pid=268)     self.run()
(VllmWorker rank=2 pid=267)     self.run()
(VllmWorker rank=3 pid=268)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(VllmWorker rank=2 pid=267)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(VllmWorker rank=3 pid=268)     self._target(*self._args, **self._kwargs)
(VllmWorker rank=2 pid=267)     self._target(*self._args, **self._kwargs)
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 449, in worker_main
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 449, in worker_main
(VllmWorker rank=3 pid=268)     worker.shutdown()
(VllmWorker rank=2 pid=267)     worker.shutdown()
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 381, in shutdown
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 381, in shutdown
(VllmWorker rank=3 pid=268)     destroy_model_parallel()
(VllmWorker rank=2 pid=267)     destroy_model_parallel()
(VllmWorker rank=0 pid=265)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 1085, in destroy_model_parallel
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 1085, in destroy_model_parallel
(VllmWorker rank=0 pid=265)     self.run()
(VllmWorker rank=3 pid=268)     _TP.destroy()
(VllmWorker rank=2 pid=267)     _TP.destroy()
(VllmWorker rank=0 pid=265)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 750, in destroy
(VllmWorker rank=0 pid=265)     self._target(*self._args, **self._kwargs)
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 750, in destroy
(VllmWorker rank=3 pid=268)     torch.distributed.destroy_process_group(self.device_group)
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 449, in worker_main
(VllmWorker rank=2 pid=267)     torch.distributed.destroy_process_group(self.device_group)
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 2168, in destroy_process_group
(VllmWorker rank=0 pid=265)     worker.shutdown()
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 2168, in destroy_process_group
(VllmWorker rank=3 pid=268)     _shutdown_backend(pg)
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 381, in shutdown
(VllmWorker rank=2 pid=267)     _shutdown_backend(pg)
(VllmWorker rank=3 pid=268)   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 1815, in _shutdown_backend
(VllmWorker rank=0 pid=265)     destroy_model_parallel()
(VllmWorker rank=2 pid=267)   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 1815, in _shutdown_backend
(VllmWorker rank=3 pid=268)     backend._shutdown()
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 1085, in destroy_model_parallel
(VllmWorker rank=2 pid=267)     backend._shutdown()
(VllmWorker rank=0 pid=265)     _TP.destroy()
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 750, in destroy
(VllmWorker rank=3 pid=268) torch.distributed.DistBackendError: NCCL error in: /pytorch/torch/csrc/distributed/c10d/NCCLUtils.cpp:133, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.21.5
(VllmWorker rank=0 pid=265)     torch.distributed.destroy_process_group(self.device_group)
(VllmWorker rank=2 pid=267) torch.distributed.DistBackendError: NCCL error in: /pytorch/torch/csrc/distributed/c10d/NCCLUtils.cpp:133, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.21.5
(VllmWorker rank=3 pid=268) ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 2168, in destroy_process_group
(VllmWorker rank=3 pid=268) Last error:
(VllmWorker rank=2 pid=267) ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorker rank=0 pid=265)     _shutdown_backend(pg)
(VllmWorker rank=3 pid=268) Cuda failure 'an illegal instruction was encountered'
(VllmWorker rank=2 pid=267) Last error:
(VllmWorker rank=0 pid=265)   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 1815, in _shutdown_backend
(VllmWorker rank=2 pid=267) Cuda failure 'an illegal instruction was encountered'
(VllmWorker rank=0 pid=265)     backend._shutdown()
(VllmWorker rank=0 pid=265) torch.distributed.DistBackendError: NCCL error in: /pytorch/torch/csrc/distributed/c10d/NCCLUtils.cpp:133, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.21.5
(VllmWorker rank=0 pid=265) ncclUnhandledCudaError: Call to CUDA function failed.
(VllmWorker rank=0 pid=265) Last error:
(VllmWorker rank=0 pid=265) Cuda failure 'an illegal instruction was encountered'
terminate called after throwing an instance of 'c10::Error'
terminate called after throwing an instance of 'c10::Error'
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fe94736c1b6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fe947315a76 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fe9477d7918 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x103ad78 (0x7fe8f545ed78 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x10433c5 (0x7fe8f54673c5 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x643a72 (0x7fe93eccba72 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x6f30f (0x7fe94734d30f in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7fe94734633b in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fe9473464e9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #9: <unknown function> + 0x906d38 (0x7fe93ef8ed38 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x300 (0x7fe93ef8f090 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x12e7cd (0x7fe887c087cd in /usr/local/lib/python3.12/dist-packages/numpy/_core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so)
frame #12: /usr/bin/python3() [0x59bf80]
frame #13: /usr/bin/python3() [0x53bea4]
frame #14: /usr/bin/python3() [0x59bf5d]
frame #15: /usr/bin/python3() [0x53bea4]
frame #16: /usr/bin/python3() [0x59bf5d]
frame #17: /usr/bin/python3() [0x53bea4]
frame #18: /usr/bin/python3() [0x59bf5d]
frame #19: /usr/bin/python3() [0x59be14]
frame #20: /usr/bin/python3() [0x59be14]
frame #21: /usr/bin/python3() [0x57ccc0]
frame #22: /usr/bin/python3() [0x57bcf6]
frame #23: /usr/bin/python3() [0x533d9a]
frame #24: /usr/bin/python3() [0x659557]
frame #25: /usr/bin/python3() [0x594d67]
frame #26: /usr/bin/python3() [0x59bdd6]
frame #27: _PyEval_EvalFrameDefault + 0x50e7 (0x54cb27 in /usr/bin/python3)
frame #28: PyEval_EvalCode + 0x99 (0x61d5b9 in /usr/bin/python3)
frame #29: /usr/bin/python3() [0x6591db]
frame #30: /usr/bin/python3() [0x654346]
frame #31: PyRun_StringFlags + 0x63 (0x6503b3 in /usr/bin/python3)
frame #32: PyRun_SimpleStringFlags + 0x3e (0x6500be in /usr/bin/python3)
frame #33: Py_RunMain + 0x4b2 (0x64d622 in /usr/bin/python3)
frame #34: Py_BytesMain + 0x2d (0x6064ad in /usr/bin/python3)
frame #35: <unknown function> + 0x29d90 (0x7fe94803ed90 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #36: __libc_start_main + 0x80 (0x7fe94803ee40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #37: _start + 0x25 (0x606325 in /usr/bin/python3)

  what():  CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f46d4b6c1b6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f46d4b15a76 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f46d4fa3918 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x103ad78 (0x7f468305ed78 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x10433c5 (0x7f46830673c5 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x643a72 (0x7f46cc8cba72 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x6f30f (0x7f46d4b4d30f in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f46d4b4633b in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f46d4b464e9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #9: <unknown function> + 0x906d38 (0x7f46ccb8ed38 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x300 (0x7f46ccb8f090 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x12e7cd (0x7f46158087cd in /usr/local/lib/python3.12/dist-packages/numpy/_core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so)
frame #12: /usr/bin/python3() [0x59bf80]
frame #13: /usr/bin/python3() [0x53bea4]
frame #14: /usr/bin/python3() [0x59bf5d]
frame #15: /usr/bin/python3() [0x53bea4]
frame #16: /usr/bin/python3() [0x59bf5d]
frame #17: /usr/bin/python3() [0x53bea4]
frame #18: /usr/bin/python3() [0x59bf5d]
frame #19: /usr/bin/python3() [0x59be14]
frame #20: /usr/bin/python3() [0x59be14]
frame #21: /usr/bin/python3() [0x57ccc0]
frame #22: /usr/bin/python3() [0x57bcf6]
frame #23: /usr/bin/python3() [0x533d9a]
frame #24: /usr/bin/python3() [0x659557]
frame #25: /usr/bin/python3() [0x594d67]
frame #26: /usr/bin/python3() [0x59bdd6]
frame #27: _PyEval_EvalFrameDefault + 0x50e7 (0x54cb27 in /usr/bin/python3)
frame #28: PyEval_EvalCode + 0x99 (0x61d5b9 in /usr/bin/python3)
frame #29: /usr/bin/python3() [0x6591db]
frame #30: /usr/bin/python3() [0x654346]
frame #31: PyRun_StringFlags + 0x63 (0x6503b3 in /usr/bin/python3)
frame #32: PyRun_SimpleStringFlags + 0x3e (0x6500be in /usr/bin/python3)
frame #33: Py_RunMain + 0x4b2 (0x64d622 in /usr/bin/python3)
frame #34: Py_BytesMain + 0x2d (0x6064ad in /usr/bin/python3)
frame #35: <unknown function> + 0x29d90 (0x7f46d5b2fd90 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #36: __libc_start_main + 0x80 (0x7f46d5b2fe40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #37: _start + 0x25 (0x606325 in /usr/bin/python3)

  what():  CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fa9cd16c1b6 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fa9cd115a76 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fa9cd5f2918 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x103ad78 (0x7fa97b25ed78 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x10433c5 (0x7fa97b2673c5 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x643a72 (0x7fa9c4acba72 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x6f30f (0x7fa9cd14d30f in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7fa9cd14633b in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fa9cd1464e9 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #9: <unknown function> + 0x906d38 (0x7fa9c4d8ed38 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x300 (0x7fa9c4d8f090 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x12e7cd (0x7fa90da087cd in /usr/local/lib/python3.12/dist-packages/numpy/_core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so)
frame #12: /usr/bin/python3() [0x59bf80]
frame #13: /usr/bin/python3() [0x53bea4]
frame #14: /usr/bin/python3() [0x59bf5d]
frame #15: /usr/bin/python3() [0x53bea4]
frame #16: /usr/bin/python3() [0x59bf5d]
frame #17: /usr/bin/python3() [0x53bea4]
frame #18: /usr/bin/python3() [0x59bf5d]
frame #19: /usr/bin/python3() [0x59be14]
frame #20: /usr/bin/python3() [0x59be14]
frame #21: /usr/bin/python3() [0x57ccc0]
frame #22: /usr/bin/python3() [0x57bcf6]
frame #23: /usr/bin/python3() [0x533d9a]
frame #24: /usr/bin/python3() [0x659557]
frame #25: /usr/bin/python3() [0x594d67]
frame #26: /usr/bin/python3() [0x59bdd6]
frame #27: _PyEval_EvalFrameDefault + 0x50e7 (0x54cb27 in /usr/bin/python3)
frame #28: PyEval_EvalCode + 0x99 (0x61d5b9 in /usr/bin/python3)
frame #29: /usr/bin/python3() [0x6591db]
frame #30: /usr/bin/python3() [0x654346]
frame #31: PyRun_StringFlags + 0x63 (0x6503b3 in /usr/bin/python3)
frame #32: PyRun_SimpleStringFlags + 0x3e (0x6500be in /usr/bin/python3)
frame #33: Py_RunMain + 0x4b2 (0x64d622 in /usr/bin/python3)
frame #34: Py_BytesMain + 0x2d (0x6064ad in /usr/bin/python3)
frame #35: <unknown function> + 0x29d90 (0x7fa9cde59d90 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #36: __libc_start_main + 0x80 (0x7fa9cde59e40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #37: _start + 0x25 (0x606325 in /usr/bin/python3)

INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [48]
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
`

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: CUDA error: an illegal instruction was encountered #18045

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: CUDA error: an illegal instruction was encountered #18045

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions