Skip to content

[Usage]: v0.6.6.post1 is incompatible with pynvml==12.0.0 #12386

@sharafeddeen

Description

@sharafeddeen

Your current environment

The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.31.4
Libc version: glibc-2.39

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: 12.6.20
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4070 Laptop GPU
Nvidia driver version: 566.36
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        39 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               16
On-line CPU(s) list:                  0-15
Vendor ID:                            GenuineIntel
Model name:                           13th Gen Intel(R) Core(TM) i7-13620H
CPU family:                           6
Model:                                186
Thread(s) per core:                   2
Core(s) per socket:                   8
Socket(s):                            1
Stepping:                             2
BogoMIPS:                             5836.80
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            384 KiB (8 instances)
L1i cache:                            256 KiB (8 instances)
L2 cache:                             10 MiB (8 instances)
L3 cache:                             24 MiB (1 instance)

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-ml-py3==7.352.0
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pynvml==11.5.3
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.48.1
[pip3] triton==3.1.0
[pip3] vllm_nccl_cu12==2.18.1.0.4.0
[conda] blas                      1.0                         mkl
[conda] cuda-cccl                 12.6.77                       0    nvidia
[conda] cuda-cccl_linux-64        12.6.77                       0    nvidia
[conda] cuda-command-line-tools   12.1.1                        0    nvidia
[conda] cuda-compiler             12.6.2                        0    nvidia
[conda] cuda-crt-dev_linux-64     12.6.20                       0    nvidia
[conda] cuda-crt-tools            12.6.20                       0    nvidia
[conda] cuda-cudart               12.1.105                      0    nvidia
[conda] cuda-cudart-dev           12.1.105                      0    nvidia
[conda] cuda-cudart-dev_linux-64  12.6.77                       0    nvidia
[conda] cuda-cudart-static        12.6.77                       0    nvidia
[conda] cuda-cudart-static_linux-64 12.6.77                       0    nvidia
[conda] cuda-cudart_linux-64      12.6.77                       0    nvidia
[conda] cuda-cuobjdump            12.6.77                       0    nvidia
[conda] cuda-cupti                12.1.105                      0    nvidia
[conda] cuda-cuxxfilt             12.6.77                       0    nvidia
[conda] cuda-documentation        12.4.127                      0    nvidia
[conda] cuda-driver-dev           12.6.77                       0    nvidia
[conda] cuda-driver-dev_linux-64  12.6.77                       0    nvidia
[conda] cuda-gdb                  12.6.77                       0    nvidia
[conda] cuda-libraries            12.1.0                        0    nvidia
[conda] cuda-libraries-dev        12.6.2                        0    nvidia
[conda] cuda-libraries-static     12.6.2                        0    nvidia
[conda] cuda-nsight               12.6.77                       0    nvidia
[conda] cuda-nvcc                 12.6.20                       0    nvidia
[conda] cuda-nvcc-dev_linux-64    12.6.20                       0    nvidia
[conda] cuda-nvcc-impl            12.6.20                       0    nvidia
[conda] cuda-nvcc-tools           12.6.20                       0    nvidia
[conda] cuda-nvcc_linux-64        12.6.20                       0    nvidia
[conda] cuda-nvdisasm             12.6.77                       0    nvidia
[conda] cuda-nvml-dev             12.6.77                       2    nvidia
[conda] cuda-nvprof               12.6.80                       0    nvidia
[conda] cuda-nvprune              12.6.77                       0    nvidia
[conda] cuda-nvrtc                12.1.105                      0    nvidia
[conda] cuda-nvrtc-dev            12.1.105                      0    nvidia
[conda] cuda-nvrtc-static         12.6.85                       0    nvidia
[conda] cuda-nvtx                 12.1.105                      0    nvidia
[conda] cuda-nvvm-dev_linux-64    12.6.20                       0    nvidia
[conda] cuda-nvvm-impl            12.6.20                       0    nvidia
[conda] cuda-nvvm-tools           12.6.20                       0    nvidia
[conda] cuda-nvvp                 12.6.80                       0    nvidia
[conda] cuda-opencl               12.6.77                       0    nvidia
[conda] cuda-opencl-dev           12.6.77                       0    nvidia
[conda] cuda-profiler-api         12.6.77                       0    nvidia
[conda] cuda-runtime              12.1.0                        0    nvidia
[conda] cuda-sanitizer-api        12.6.77                       0    nvidia
[conda] cuda-toolkit              12.1.0                        0    nvidia
[conda] cuda-tools                12.1.1                        0    nvidia
[conda] cuda-version              12.6                          3    nvidia
[conda] cuda-visual-tools         12.6.2                        0    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] gds-tools                 1.11.1.6                      0    nvidia
[conda] libcublas                 12.1.0.26                     0    nvidia
[conda] libcublas-dev             12.1.0.26                     0    nvidia
[conda] libcublas-static          12.6.4.1                      0    nvidia
[conda] libcufft                  11.0.2.4                      0    nvidia
[conda] libcufft-dev              11.0.2.4                      0    nvidia
[conda] libcufft-static           11.3.0.4                      0    nvidia
[conda] libcufile                 1.11.1.6                      0    nvidia
[conda] libcufile-dev             1.11.1.6                      0    nvidia
[conda] libcufile-static          1.11.1.6                      0    nvidia
[conda] libcurand                 10.3.7.77                     0    nvidia
[conda] libcurand-dev             10.3.7.77                     0    nvidia
[conda] libcurand-static          10.3.7.77                     0    nvidia
[conda] libcusolver               11.4.4.55                     0    nvidia
[conda] libcusolver-dev           11.4.4.55                     0    nvidia
[conda] libcusolver-static        11.7.1.2                      0    nvidia
[conda] libcusparse               12.0.2.55                     0    nvidia
[conda] libcusparse-dev           12.0.2.55                     0    nvidia
[conda] libcusparse-static        12.5.4.2                      0    nvidia
[conda] libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
[conda] libnpp                    12.0.2.50                     0    nvidia
[conda] libnpp-dev                12.0.2.50                     0    nvidia
[conda] libnpp-static             12.3.1.54                     0    nvidia
[conda] libnvfatbin               12.6.77                       0    nvidia
[conda] libnvfatbin-dev           12.6.77                       0    nvidia
[conda] libnvfatbin-static        12.6.77                       0    nvidia
[conda] libnvjitlink              12.1.105                      0    nvidia
[conda] libnvjitlink-dev          12.1.105                      0    nvidia
[conda] libnvjitlink-static       12.6.85                       0    nvidia
[conda] libnvjpeg                 12.1.1.14                     0    nvidia
[conda] libnvjpeg-dev             12.1.1.14                     0    nvidia
[conda] libnvjpeg-static          12.3.3.54                     0    nvidia
[conda] libnvvm-samples           12.1.105                      0    nvidia
[conda] mkl                       2023.1.0         h213fc3f_46344
[conda] mkl-service               2.4.0           py310h5eee18b_2
[conda] mkl_fft                   1.3.11          py310h5eee18b_0
[conda] mkl_random                1.2.8           py310h1128e8f_0
[conda] nsight-compute            2024.3.2.3                    0    nvidia
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu12        12.4.5.8                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.4.127                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
[conda] nvidia-ml-py              12.560.30                pypi_0    pypi
[conda] nvidia-ml-py3             7.352.0                  pypi_0    pypi
[conda] nvidia-nccl-cu12          2.21.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi
[conda] pynvml                    11.5.3                   pypi_0    pypi
[conda] pytorch-cuda              12.1                 ha16c6d3_6    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pyzmq                     26.2.0          py310h71f11fc_3    conda-forge
[conda] torch                     2.5.1                    pypi_0    pypi
[conda] torchaudio                2.5.1               py310_cu121    pytorch
[conda] torchvision               0.20.1              py310_cu121    pytorch
[conda] transformers              4.48.1                   pypi_0    pypi
[conda] triton                    3.1.0                    pypi_0    pypi
[conda] vllm-nccl-cu12            2.18.1.0.4.0             pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.6.post1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X                              N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

LD_LIBRARY_PATH=/home/sharaf/anaconda3/envs/aimo/lib/python3.10/site-packages/cv2/../../lib64:
CUDA_MODULE_LOADING=LAZY

Model Input Dumps

No response

🐛 Describe the bug

description

  • current vLLM version is trying to access cuda compatibility info via pynvml.nvmlDeviceGetCudaComputeCapability(handle) (vllm/platforms/cuda.py:160)
  • but instead is getting an error: AttributeError: module 'pynvml' has no attribute 'nvmlDeviceGetCudaComputeCapability'
  • resolved when I downgraded pynvml from 12.0.0 to 11.5.3
when I asked the chatbot on the docs page, it mentioned that the `nvidia-ml-py` upgrade was a fix for an earlier issue https://github.com//issues/9821 possible root cause of the issueImage

suggested action

I'm still not sure if this was just my setup, but from my results it seems there's an incompatibility between vLLM & pynvml.
I did a bit of searching & found out pynvml mirrors nvidia-ml-py, which is listed in the dependency tree of this version of vLLM
w/o further research I'd suggest setting back the version of nvidia-ml-py in the required dependencies to be compatible with pynvml==11.5.3

Image
Image

responsible code

vllm = LLM(
        model=config.base_model if config.base_model else config.model_id,
        tensor_parallel_size=num_gpus,
        dtype="auto",
        quantization=None,
        gpu_memory_utilization=0.99,
        max_num_seqs=128,
        max_model_len=2048,
        enable_lora=True,
    )

error message

Cell In[4], [line 14](vscode-notebook-cell:?execution_count=4&line=14)
     [12](vscode-notebook-cell:?execution_count=4&line=12) else:
     [13](vscode-notebook-cell:?execution_count=4&line=13)     quantization = None
---> [14](vscode-notebook-cell:?execution_count=4&line=14) vllm = LLM(
     [15](vscode-notebook-cell:?execution_count=4&line=15)     model=config.base_model if config.base_model else config.model_id,
     [16](vscode-notebook-cell:?execution_count=4&line=16)     #tokenizer=config.base_model,
     [17](vscode-notebook-cell:?execution_count=4&line=17)     tensor_parallel_size=num_gpus,
     [18](vscode-notebook-cell:?execution_count=4&line=18)     dtype="auto",
     [19](vscode-notebook-cell:?execution_count=4&line=19)     quantization=None,
     [20](vscode-notebook-cell:?execution_count=4&line=20)     #quantization="bitsandbytes",
     [21](vscode-notebook-cell:?execution_count=4&line=21)     #load_format="bitsandbytes",
     [22](vscode-notebook-cell:?execution_count=4&line=22)     #swap_space=2,
     [23](vscode-notebook-cell:?execution_count=4&line=23)     gpu_memory_utilization=0.99,
     [24](vscode-notebook-cell:?execution_count=4&line=24)     #cpu_offload_gb=3,
     [25](vscode-notebook-cell:?execution_count=4&line=25)     max_num_seqs=128,
     [26](vscode-notebook-cell:?execution_count=4&line=26)     max_model_len=2048,
     [27](vscode-notebook-cell:?execution_count=4&line=27)     #enforce_eager=True,
     [28](vscode-notebook-cell:?execution_count=4&line=28)     enable_lora=True,
     [29](vscode-notebook-cell:?execution_count=4&line=29) )
     [30](vscode-notebook-cell:?execution_count=4&line=30) return vllm

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:986, in deprecate_args.<locals>.wrapper.<locals>.inner(*args, **kwargs)
    [979](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:979)             msg += f" {additional_message}"
    [981](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:981)         warnings.warn(
    [982](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:982)             DeprecationWarning(msg),
    [983](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:983)             stacklevel=3,  # The inner function takes up one level
    [984](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:984)         )
--> [986](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/utils.py:986) return fn(*args, **kwargs)

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:230, in LLM.__init__(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, allowed_local_media_path, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_seq_len_to_capture, disable_custom_all_reduce, disable_async_output_proc, hf_overrides, mm_processor_kwargs, task, override_pooler_config, compilation_config, **kwargs)
    [227](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:227) self.engine_class = self.get_engine_class()
    [229](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:229) # TODO(rob): enable mp by default (issue with fork vs spawn)
--> [230](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:230) self.llm_engine = self.engine_class.from_engine_args(
    [231](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:231)     engine_args, usage_context=UsageContext.LLM_CLASS)
    [233](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/entrypoints/llm.py:233) self.request_counter = Counter()

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:517, in LLMEngine.from_engine_args(cls, engine_args, usage_context, stat_loggers)
    [515](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:515) executor_class = cls._get_executor_cls(engine_config)
    [516](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:516) # Create the LLM engine.
--> [517](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:517) engine = cls(
    [518](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:518)     vllm_config=engine_config,
    [519](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:519)     executor_class=executor_class,
    [520](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:520)     log_stats=not engine_args.disable_log_stats,
    [521](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:521)     usage_context=usage_context,
    [522](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:522)     stat_loggers=stat_loggers,
    [523](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:523) )
    [525](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:525) return engine

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:273, in LLMEngine.__init__(self, vllm_config, executor_class, log_stats, usage_context, stat_loggers, input_registry, mm_registry, use_cached_outputs)
    [269](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:269) self.input_registry = input_registry
    [270](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:270) self.input_processor = input_registry.create_input_processor(
    [271](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:271)     self.model_config)
--> [273](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:273) self.model_executor = executor_class(vllm_config=vllm_config, )
    [275](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:275) if self.model_config.runner_type != "pooling":
    [276](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/engine/llm_engine.py:276)     self._initialize_kv_caches()

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/executor_base.py:36, in ExecutorBase.__init__(self, vllm_config)
     [34](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/executor_base.py:34) self.prompt_adapter_config = vllm_config.prompt_adapter_config
     [35](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/executor_base.py:35) self.observability_config = vllm_config.observability_config
---> [36](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/executor_base.py:36) self._init_executor()

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:33, in GPUExecutor._init_executor(self)
     [28](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:28) """Initialize the worker and load the model.
     [29](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:29) """
     [30](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:30) assert self.parallel_config.world_size == 1, (
     [31](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:31)     "GPUExecutor only supports single GPU.")
---> [33](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:33) self.driver_worker = self._create_worker()
     [34](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:34) self.driver_worker.init_device()
     [35](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:35) self.driver_worker.load_model()

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:59, in GPUExecutor._create_worker(self, local_rank, rank, distributed_init_method)
     [55](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:55) def _create_worker(self,
     [56](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:56)                    local_rank: int = 0,
     [57](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:57)                    rank: int = 0,
     [58](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:58)                    distributed_init_method: Optional[str] = None):
---> [59](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:59)     return create_worker(**self._get_worker_kwargs(
     [60](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:60)         local_rank=local_rank,
     [61](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:61)         rank=rank,
     [62](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:62)         distributed_init_method=distributed_init_method))

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:19, in create_worker(**kwargs)
     [17](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:17) vllm_config = kwargs.get("vllm_config")
     [18](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:18) wrapper = WorkerWrapperBase(vllm_config=vllm_config)
---> [19](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:19) wrapper.init_worker(**kwargs)
     [20](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:20) return wrapper.worker

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:452, in WorkerWrapperBase.init_worker(self, *args, **kwargs)
    [448](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:448) load_general_plugins()
    [450](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:450) worker_class = resolve_obj_by_qualname(
    [451](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:451)     self.vllm_config.parallel_config.worker_cls)
--> [452](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:452) self.worker = worker_class(*args, **kwargs)
    [453](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker_base.py:453) assert self.worker is not None

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:82, in Worker.__init__(self, vllm_config, local_rank, rank, distributed_init_method, is_driver_worker, model_runner_cls)
     [80](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:80) elif self.model_config.is_encoder_decoder:
     [81](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:81)     ModelRunnerClass = EncoderDecoderModelRunner
---> [82](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:82) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
     [83](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:83)     vllm_config=self.vllm_config,
     [84](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:84)     kv_cache_dtype=self.cache_config.cache_dtype,
     [85](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:85)     is_driver_worker=is_driver_worker,
     [86](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:86)     **speculative_args,
     [87](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:87) )
     [88](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:88) if model_runner_cls is not None:
     [89](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/worker.py:89)     self.model_runner = model_runner_cls(self.model_runner)

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1051, in GPUModelRunnerBase.__init__(self, vllm_config, kv_cache_dtype, is_driver_worker, return_hidden_states, input_registry, mm_registry)
   [1046](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1046) num_attn_heads = self.model_config.get_num_attention_heads(
   [1047](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1047)     self.parallel_config)
   [1048](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1048) needs_attn_backend = (num_attn_heads != 0
   [1049](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1049)                       or self.model_config.is_attention_free)
-> [1051](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1051) self.attn_backend = get_attn_backend(
   [1052](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1052)     self.model_config.get_head_size(),
   [1053](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1053)     self.model_config.dtype,
   [1054](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1054)     self.kv_cache_dtype,
   [1055](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1055)     self.block_size,
   [1056](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1056)     self.model_config.is_attention_free,
   [1057](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1057) ) if needs_attn_backend else None
   [1058](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1058) if self.attn_backend:
   [1059](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1059)     self.attn_state = self.attn_backend.get_state_cls()(
   [1060](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/worker/model_runner.py:1060)         weakref.proxy(self))

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:90, in get_attn_backend(head_size, dtype, kv_cache_dtype, block_size, is_attention_free, is_blocksparse)
     [85](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:85) """Selects which attention backend to use and lazily imports it."""
     [86](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:86) # Accessing envs.* behind an @lru_cache decorator can cause the wrong
     [87](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:87) # value to be returned from the cache if the value changes between calls.
     [88](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:88) # To avoid this, we read envs.VLLM_USE_V1 here and pass it explicitly to the
     [89](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:89) # private function.
---> [90](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:90) return _cached_get_attn_backend(
     [91](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:91)     head_size=head_size,
     [92](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:92)     dtype=dtype,
     [93](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:93)     kv_cache_dtype=kv_cache_dtype,
     [94](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:94)     block_size=block_size,
     [95](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:95)     is_attention_free=is_attention_free,
     [96](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:96)     is_blocksparse=is_blocksparse,
     [97](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:97)     use_v1=envs.VLLM_USE_V1,
     [98](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:98) )

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:117, in _cached_get_attn_backend(head_size, dtype, kv_cache_dtype, block_size, is_attention_free, is_blocksparse, use_v1)
    [113](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:113)     from vllm.attention.backends.blocksparse_attn import (
    [114](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:114)         BlocksparseFlashAttentionBackend)
    [115](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:115)     return BlocksparseFlashAttentionBackend
--> [117](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:117) backend = which_attn_to_use(head_size, dtype, kv_cache_dtype, block_size,
    [118](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:118)                             is_attention_free, use_v1)
    [119](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:119) if backend == _Backend.FLASH_ATTN:
    [120](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:120)     logger.info("Using Flash Attention backend.")

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:215, in which_attn_to_use(head_size, dtype, kv_cache_dtype, block_size, is_attention_free, use_v1)
    [213](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:213) # FlashAttn in NVIDIA GPUs.
    [214](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:214) if selected_backend == _Backend.FLASH_ATTN:
--> [215](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:215)     if not current_platform.has_device_capability(80):
    [216](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:216)         # Volta and Turing NVIDIA GPUs.
    [217](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:217)         logger.info(
    [218](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:218)             "Cannot use FlashAttention-2 backend for Volta and Turing "
    [219](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:219)             "GPUs.")
    [220](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/attention/selector.py:220)         selected_backend = _Backend.XFORMERS

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:68, in with_nvml_context.<locals>.wrapper(*args, **kwargs)
     [66](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:66) pynvml.nvmlInit()
     [67](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:67) try:
---> [68](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:68)     return fn(*args, **kwargs)
     [69](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:69) finally:
     [70](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:70)     pynvml.nvmlShutdown()

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:174, in NvmlCudaPlatform.has_device_capability(cls, capability, device_id)
    [165](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:165) @classmethod
    [166](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:166) @lru_cache(maxsize=8)
    [167](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:167) @with_nvml_context
   (...)
    [171](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:171)     device_id: int = 0,
    [172](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:172) ) -> bool:
    [173](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:173)     try:
--> [174](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:174)         return super().has_device_capability(capability, device_id)
    [175](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:175)     except RuntimeError:
    [176](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:176)         return False

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:141, in Platform.has_device_capability(cls, capability, device_id)
    [127](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:127) @classmethod
    [128](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:128) def has_device_capability(
    [129](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:129)     cls,
    [130](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:130)     capability: Union[Tuple[int, int], int],
    [131](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:131)     device_id: int = 0,
    [132](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:132) ) -> bool:
    [133](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:133)     """
    [134](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:134)     Test whether this platform is compatible with a device capability.
    [135](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:135) 
   (...)
    [139](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:139)     - An integer ``<major><minor>``. (See :meth:`DeviceCapability.to_int`)
    [140](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:140)     """
--> [141](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:141)     current_capability = cls.get_device_capability(device_id=device_id)
    [142](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:142)     if current_capability is None:
    [143](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/interface.py:143)         return False

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:68, in with_nvml_context.<locals>.wrapper(*args, **kwargs)
     [66](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:66) pynvml.nvmlInit()
     [67](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:67) try:
---> [68](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:68)     return fn(*args, **kwargs)
     [69](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:69) finally:
     [70](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:70)     pynvml.nvmlShutdown()

File ~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:160, in NvmlCudaPlatform.get_device_capability(cls, device_id)
    [158](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:158)     physical_device_id = device_id_to_physical_device_id(device_id)
    [159](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:159)     handle = pynvml.nvmlDeviceGetHandleByIndex(physical_device_id)
--> [160](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:160)     major, minor = pynvml.nvmlDeviceGetCudaComputeCapability(handle)
    [161](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:161)     return DeviceCapability(major=major, minor=minor)
    [162](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/ad17a/OneDrive/Desktop/hackathons/aimo2/misc/aimo-progress-prize/~/anaconda3/envs/aimo/lib/python3.10/site-packages/vllm/platforms/cuda.py:162) except RuntimeError:

AttributeError: module 'pynvml' has no attribute 'nvmlDeviceGetCudaComputeCapability'

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions