-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.31.4
Libc version: glibc-2.39
Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: 12.6.20
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4070 Laptop GPU
Nvidia driver version: 566.36
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
Model name: 13th Gen Intel(R) Core(TM) i7-13620H
CPU family: 6
Model: 186
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 2
BogoMIPS: 5836.80
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 384 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 10 MiB (8 instances)
L3 cache: 24 MiB (1 instance)
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-ml-py3==7.352.0
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pynvml==11.5.3
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.48.1
[pip3] triton==3.1.0
[pip3] vllm_nccl_cu12==2.18.1.0.4.0
[conda] blas 1.0 mkl
[conda] cuda-cccl 12.6.77 0 nvidia
[conda] cuda-cccl_linux-64 12.6.77 0 nvidia
[conda] cuda-command-line-tools 12.1.1 0 nvidia
[conda] cuda-compiler 12.6.2 0 nvidia
[conda] cuda-crt-dev_linux-64 12.6.20 0 nvidia
[conda] cuda-crt-tools 12.6.20 0 nvidia
[conda] cuda-cudart 12.1.105 0 nvidia
[conda] cuda-cudart-dev 12.1.105 0 nvidia
[conda] cuda-cudart-dev_linux-64 12.6.77 0 nvidia
[conda] cuda-cudart-static 12.6.77 0 nvidia
[conda] cuda-cudart-static_linux-64 12.6.77 0 nvidia
[conda] cuda-cudart_linux-64 12.6.77 0 nvidia
[conda] cuda-cuobjdump 12.6.77 0 nvidia
[conda] cuda-cupti 12.1.105 0 nvidia
[conda] cuda-cuxxfilt 12.6.77 0 nvidia
[conda] cuda-documentation 12.4.127 0 nvidia
[conda] cuda-driver-dev 12.6.77 0 nvidia
[conda] cuda-driver-dev_linux-64 12.6.77 0 nvidia
[conda] cuda-gdb 12.6.77 0 nvidia
[conda] cuda-libraries 12.1.0 0 nvidia
[conda] cuda-libraries-dev 12.6.2 0 nvidia
[conda] cuda-libraries-static 12.6.2 0 nvidia
[conda] cuda-nsight 12.6.77 0 nvidia
[conda] cuda-nvcc 12.6.20 0 nvidia
[conda] cuda-nvcc-dev_linux-64 12.6.20 0 nvidia
[conda] cuda-nvcc-impl 12.6.20 0 nvidia
[conda] cuda-nvcc-tools 12.6.20 0 nvidia
[conda] cuda-nvcc_linux-64 12.6.20 0 nvidia
[conda] cuda-nvdisasm 12.6.77 0 nvidia
[conda] cuda-nvml-dev 12.6.77 2 nvidia
[conda] cuda-nvprof 12.6.80 0 nvidia
[conda] cuda-nvprune 12.6.77 0 nvidia
[conda] cuda-nvrtc 12.1.105 0 nvidia
[conda] cuda-nvrtc-dev 12.1.105 0 nvidia
[conda] cuda-nvrtc-static 12.6.85 0 nvidia
[conda] cuda-nvtx 12.1.105 0 nvidia
[conda] cuda-nvvm-dev_linux-64 12.6.20 0 nvidia
[conda] cuda-nvvm-impl 12.6.20 0 nvidia
[conda] cuda-nvvm-tools 12.6.20 0 nvidia
[conda] cuda-nvvp 12.6.80 0 nvidia
[conda] cuda-opencl 12.6.77 0 nvidia
[conda] cuda-opencl-dev 12.6.77 0 nvidia
[conda] cuda-profiler-api 12.6.77 0 nvidia
[conda] cuda-runtime 12.1.0 0 nvidia
[conda] cuda-sanitizer-api 12.6.77 0 nvidia
[conda] cuda-toolkit 12.1.0 0 nvidia
[conda] cuda-tools 12.1.1 0 nvidia
[conda] cuda-version 12.6 3 nvidia
[conda] cuda-visual-tools 12.6.2 0 nvidia
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] gds-tools 1.11.1.6 0 nvidia
[conda] libcublas 12.1.0.26 0 nvidia
[conda] libcublas-dev 12.1.0.26 0 nvidia
[conda] libcublas-static 12.6.4.1 0 nvidia
[conda] libcufft 11.0.2.4 0 nvidia
[conda] libcufft-dev 11.0.2.4 0 nvidia
[conda] libcufft-static 11.3.0.4 0 nvidia
[conda] libcufile 1.11.1.6 0 nvidia
[conda] libcufile-dev 1.11.1.6 0 nvidia
[conda] libcufile-static 1.11.1.6 0 nvidia
[conda] libcurand 10.3.7.77 0 nvidia
[conda] libcurand-dev 10.3.7.77 0 nvidia
[conda] libcurand-static 10.3.7.77 0 nvidia
[conda] libcusolver 11.4.4.55 0 nvidia
[conda] libcusolver-dev 11.4.4.55 0 nvidia
[conda] libcusolver-static 11.7.1.2 0 nvidia
[conda] libcusparse 12.0.2.55 0 nvidia
[conda] libcusparse-dev 12.0.2.55 0 nvidia
[conda] libcusparse-static 12.5.4.2 0 nvidia
[conda] libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
[conda] libnpp 12.0.2.50 0 nvidia
[conda] libnpp-dev 12.0.2.50 0 nvidia
[conda] libnpp-static 12.3.1.54 0 nvidia
[conda] libnvfatbin 12.6.77 0 nvidia
[conda] libnvfatbin-dev 12.6.77 0 nvidia
[conda] libnvfatbin-static 12.6.77 0 nvidia
[conda] libnvjitlink 12.1.105 0 nvidia
[conda] libnvjitlink-dev 12.1.105 0 nvidia
[conda] libnvjitlink-static 12.6.85 0 nvidia
[conda] libnvjpeg 12.1.1.14 0 nvidia
[conda] libnvjpeg-dev 12.1.1.14 0 nvidia
[conda] libnvjpeg-static 12.3.3.54 0 nvidia
[conda] libnvvm-samples 12.1.105 0 nvidia
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py310h5eee18b_2
[conda] mkl_fft 1.3.11 py310h5eee18b_0
[conda] mkl_random 1.2.8 py310h1128e8f_0
[conda] nsight-compute 2024.3.2.3 0 nvidia
[conda] numpy 1.26.4 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
[conda] nvidia-ml-py 12.560.30 pypi_0 pypi
[conda] nvidia-ml-py3 7.352.0 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.21.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
[conda] pynvml 11.5.3 pypi_0 pypi
[conda] pytorch-cuda 12.1 ha16c6d3_6 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] pyzmq 26.2.0 py310h71f11fc_3 conda-forge
[conda] torch 2.5.1 pypi_0 pypi
[conda] torchaudio 2.5.1 py310_cu121 pytorch
[conda] torchvision 0.20.1 py310_cu121 pytorch
[conda] transformers 4.48.1 pypi_0 pypi
[conda] triton 3.1.0 pypi_0 pypi
[conda] vllm-nccl-cu12 2.18.1.0.4.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.6.post1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
LD_LIBRARY_PATH=/home/sharaf/anaconda3/envs/aimo/lib/python3.10/site-packages/cv2/../../lib64:
CUDA_MODULE_LOADING=LAZY
Model Input Dumps
No response
🐛 Describe the bug
description
- current vLLM version is trying to access cuda compatibility info via
pynvml.nvmlDeviceGetCudaComputeCapability(handle)
(vllm/platforms/cuda.py:160
) - but instead is getting an error:
AttributeError: module 'pynvml' has no attribute 'nvmlDeviceGetCudaComputeCapability'
- resolved when I downgraded pynvml from 12.0.0 to 11.5.3
when I asked the chatbot on the docs page, it mentioned that the `nvidia-ml-py` upgrade was a fix for an earlier issue https://github.com//issues/9821
possible root cause of the issue

suggested action
I'm still not sure if this was just my setup, but from my results it seems there's an incompatibility between vLLM & pynvml.
I did a bit of searching & found out pynvml mirrors nvidia-ml-py
, which is listed in the dependency tree of this version of vLLM
w/o further research I'd suggest setting back the version of nvidia-ml-py
in the required dependencies to be compatible with pynvml==11.5.3
responsible code
vllm = LLM(
model=config.base_model if config.base_model else config.model_id,
tensor_parallel_size=num_gpus,
dtype="auto",
quantization=None,
gpu_memory_utilization=0.99,
max_num_seqs=128,
max_model_len=2048,
enable_lora=True,
)
error message
Cell In[4], [line 14](vscode-notebook-cell:?execution_count=4&line=14)
[12](vscode-notebook-cell:?execution_count=4&line=12) else:
[13](vscode-notebook-cell:?execution_count=4&line=13) quantization = None
---> [14](vscode-notebook-cell:?execution_count=4&line=14) vllm = LLM(
[15](vscode-notebook-cell:?execution_count=4&line=15) model=config.base_model if config.base_model else config.model_id,
[16](vscode-notebook-cell:?execution_count=4&line=16) #tokenizer=config.base_model,
[17](vscode-notebook-cell:?execution_count=4&line=17) tensor_parallel_size=num_gpus,
[18](vscode-notebook-cell:?execution_count=4&line=18) dtype="auto",
[19](vscode-notebook-cell:?execution_count=4&line=19) quantization=None,
[20](vscode-notebook-cell:?execution_count=4&line=20) #quantization="bitsandbytes",
[21](vscode-notebook-cell:?execution_count=4&line=21) #load_format="bitsandbytes",
[22](vscode-notebook-cell:?execution_count=4&line=22) #swap_space=2,
[23](vscode-notebook-cell:?execution_count=4&line=23) gpu_memory_utilization=0.99,
[24](vscode-notebook-cell:?execution_count=4&line=24) #cpu_offload_gb=3,
[25](vscode-notebook-cell:?execution_count=4&line=25) max_num_seqs=128,
[26](vscode-notebook-cell:?execution_count=4&line=26) max_model_len=2048,
[27](vscode-notebook-cell:?execution_count=4&line=27) #enforce_eager=True,
[28](vscode-notebook-cell:?execution_count=4&line=28) enable_lora=True,
[29](vscode-notebook-cell:?execution_count=4&line=29) )
[30](vscode-notebook-cell:?execution_count=4&line=30) return vllm
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:986, in deprecate_args.<locals>.wrapper.<locals>.inner(*args, **kwargs)
[979](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:979) msg += f" {additional_message}"
[981](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:981) warnings.warn(
[982](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:982) DeprecationWarning(msg),
[983](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:983) stacklevel=3, # The inner function takes up one level
[984](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:984) )
--> [986](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:986) return fn(*args, **kwargs)
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:230, in LLM.__init__(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, allowed_local_media_path, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_seq_len_to_capture, disable_custom_all_reduce, disable_async_output_proc, hf_overrides, mm_processor_kwargs, task, override_pooler_config, compilation_config, **kwargs)
[227](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:227) self.engine_class = self.get_engine_class()
[229](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:229) # TODO(rob): enable mp by default (issue with fork vs spawn)
--> [230](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:230) self.llm_engine = self.engine_class.from_engine_args(
[231](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:231) engine_args, usage_context=UsageContext.LLM_CLASS)
[233](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:233) self.request_counter = Counter()
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:517, in LLMEngine.from_engine_args(cls, engine_args, usage_context, stat_loggers)
[515](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:515) executor_class = cls._get_executor_cls(engine_config)
[516](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:516) # Create the LLM engine.
--> [517](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:517) engine = cls(
[518](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:518) vllm_config=engine_config,
[519](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:519) executor_class=executor_class,
[520](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:520) log_stats=not engine_args.disable_log_stats,
[521](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:521) usage_context=usage_context,
[522](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:522) stat_loggers=stat_loggers,
[523](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:523) )
[525](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:525) return engine
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:273, in LLMEngine.__init__(self, vllm_config, executor_class, log_stats, usage_context, stat_loggers, input_registry, mm_registry, use_cached_outputs)
[269](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:269) self.input_registry = input_registry
[270](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:270) self.input_processor = input_registry.create_input_processor(
[271](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:271) self.model_config)
--> [273](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:273) self.model_executor = executor_class(vllm_config=vllm_config, )
[275](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:275) if self.model_config.runner_type != "pooling":
[276](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:276) self._initialize_kv_caches()
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/executor_base.py:36, in ExecutorBase.__init__(self, vllm_config)
[34](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/executor_base.py:34) self.prompt_adapter_config = vllm_config.prompt_adapter_config
[35](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/executor_base.py:35) self.observability_config = vllm_config.observability_config
---> [36](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/executor_base.py:36) self._init_executor()
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:33, in GPUExecutor._init_executor(self)
[28](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:28) """Initialize the worker and load the model.
[29](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:29) """
[30](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:30) assert self.parallel_config.world_size == 1, (
[31](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:31) "GPUExecutor only supports single GPU.")
---> [33](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:33) self.driver_worker = self._create_worker()
[34](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:34) self.driver_worker.init_device()
[35](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:35) self.driver_worker.load_model()
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:59, in GPUExecutor._create_worker(self, local_rank, rank, distributed_init_method)
[55](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:55) def _create_worker(self,
[56](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:56) local_rank: int = 0,
[57](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:57) rank: int = 0,
[58](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:58) distributed_init_method: Optional[str] = None):
---> [59](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:59) return create_worker(**self._get_worker_kwargs(
[60](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:60) local_rank=local_rank,
[61](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:61) rank=rank,
[62](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:62) distributed_init_method=distributed_init_method))
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:19, in create_worker(**kwargs)
[17](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:17) vllm_config = kwargs.get("vllm_config")
[18](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:18) wrapper = WorkerWrapperBase(vllm_config=vllm_config)
---> [19](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:19) wrapper.init_worker(**kwargs)
[20](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:20) return wrapper.worker
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:452, in WorkerWrapperBase.init_worker(self, *args, **kwargs)
[448](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:448) load_general_plugins()
[450](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:450) worker_class = resolve_obj_by_qualname(
[451](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:451) self.vllm_config.parallel_config.worker_cls)
--> [452](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:452) self.worker = worker_class(*args, **kwargs)
[453](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:453) assert self.worker is not None
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:82, in Worker.__init__(self, vllm_config, local_rank, rank, distributed_init_method, is_driver_worker, model_runner_cls)
[80](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:80) elif self.model_config.is_encoder_decoder:
[81](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:81) ModelRunnerClass = EncoderDecoderModelRunner
---> [82](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:82) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
[83](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:83) vllm_config=self.vllm_config,
[84](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:84) kv_cache_dtype=self.cache_config.cache_dtype,
[85](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:85) is_driver_worker=is_driver_worker,
[86](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:86) **speculative_args,
[87](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:87) )
[88](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:88) if model_runner_cls is not None:
[89](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:89) self.model_runner = model_runner_cls(self.model_runner)
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1051, in GPUModelRunnerBase.__init__(self, vllm_config, kv_cache_dtype, is_driver_worker, return_hidden_states, input_registry, mm_registry)
[1046](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1046) num_attn_heads = self.model_config.get_num_attention_heads(
[1047](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1047) self.parallel_config)
[1048](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1048) needs_attn_backend = (num_attn_heads != 0
[1049](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1049) or self.model_config.is_attention_free)
-> [1051](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1051) self.attn_backend = get_attn_backend(
[1052](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1052) self.model_config.get_head_size(),
[1053](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1053) self.model_config.dtype,
[1054](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1054) self.kv_cache_dtype,
[1055](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1055) self.block_size,
[1056](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1056) self.model_config.is_attention_free,
[1057](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1057) ) if needs_attn_backend else None
[1058](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1058) if self.attn_backend:
[1059](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1059) self.attn_state = self.attn_backend.get_state_cls()(
[1060](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1060) weakref.proxy(self))
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:90, in get_attn_backend(head_size, dtype, kv_cache_dtype, block_size, is_attention_free, is_blocksparse)
[85](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:85) """Selects which attention backend to use and lazily imports it."""
[86](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:86) # Accessing envs.* behind an @lru_cache decorator can cause the wrong
[87](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:87) # value to be returned from the cache if the value changes between calls.
[88](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:88) # To avoid this, we read envs.VLLM_USE_V1 here and pass it explicitly to the
[89](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:89) # private function.
---> [90](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:90) return _cached_get_attn_backend(
[91](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:91) head_size=head_size,
[92](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:92) dtype=dtype,
[93](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:93) kv_cache_dtype=kv_cache_dtype,
[94](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:94) block_size=block_size,
[95](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:95) is_attention_free=is_attention_free,
[96](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:96) is_blocksparse=is_blocksparse,
[97](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:97) use_v1=envs.VLLM_USE_V1,
[98](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:98) )
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:117, in _cached_get_attn_backend(head_size, dtype, kv_cache_dtype, block_size, is_attention_free, is_blocksparse, use_v1)
[113](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:113) from vllm.attention.backends.blocksparse_attn import (
[114](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:114) BlocksparseFlashAttentionBackend)
[115](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:115) return BlocksparseFlashAttentionBackend
--> [117](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:117) backend = which_attn_to_use(head_size, dtype, kv_cache_dtype, block_size,
[118](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:118) is_attention_free, use_v1)
[119](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:119) if backend == _Backend.FLASH_ATTN:
[120](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:120) logger.info("Using Flash Attention backend.")
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:215, in which_attn_to_use(head_size, dtype, kv_cache_dtype, block_size, is_attention_free, use_v1)
[213](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:213) # FlashAttn in NVIDIA GPUs.
[214](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:214) if selected_backend == _Backend.FLASH_ATTN:
--> [215](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:215) if not current_platform.has_device_capability(80):
[216](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:216) # Volta and Turing NVIDIA GPUs.
[217](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:217) logger.info(
[218](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:218) "Cannot use FlashAttention-2 backend for Volta and Turing "
[219](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:219) "GPUs.")
[220](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:220) selected_backend = _Backend.XFORMERS
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:68, in with_nvml_context.<locals>.wrapper(*args, **kwargs)
[66](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:66) pynvml.nvmlInit()
[67](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:67) try:
---> [68](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:68) return fn(*args, **kwargs)
[69](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:69) finally:
[70](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:70) pynvml.nvmlShutdown()
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:174, in NvmlCudaPlatform.has_device_capability(cls, capability, device_id)
[165](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:165) @classmethod
[166](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:166) @lru_cache(maxsize=8)
[167](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:167) @with_nvml_context
(...)
[171](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:171) device_id: int = 0,
[172](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:172) ) -> bool:
[173](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:173) try:
--> [174](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:174) return super().has_device_capability(capability, device_id)
[175](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:175) except RuntimeError:
[176](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:176) return False
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:141, in Platform.has_device_capability(cls, capability, device_id)
[127](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:127) @classmethod
[128](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:128) def has_device_capability(
[129](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:129) cls,
[130](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:130) capability: Union[Tuple[int, int], int],
[131](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:131) device_id: int = 0,
[132](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:132) ) -> bool:
[133](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:133) """
[134](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:134) Test whether this platform is compatible with a device capability.
[135](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:135)
(...)
[139](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:139) - An integer ``<major><minor>``. (See :meth:`DeviceCapability.to_int`)
[140](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:140) """
--> [141](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:141) current_capability = cls.get_device_capability(device_id=device_id)
[142](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:142) if current_capability is None:
[143](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:143) return False
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:68, in with_nvml_context.<locals>.wrapper(*args, **kwargs)
[66](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:66) pynvml.nvmlInit()
[67](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:67) try:
---> [68](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:68) return fn(*args, **kwargs)
[69](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:69) finally:
[70](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:70) pynvml.nvmlShutdown()
File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:160, in NvmlCudaPlatform.get_device_capability(cls, device_id)
[158](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:158) physical_device_id = device_id_to_physical_device_id(device_id)
[159](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:159) handle = pynvml.nvmlDeviceGetHandleByIndex(physical_device_id)
--> [160](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:160) major, minor = pynvml.nvmlDeviceGetCudaComputeCapability(handle)
[161](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:161) return DeviceCapability(major=major, minor=minor)
[162](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:162) except RuntimeError:
AttributeError: module 'pynvml' has no attribute 'nvmlDeviceGetCudaComputeCapability'
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working