Skip to content

[Bug]: CohereForAI/c4ai-command-r-v01 : ValueError: User-specified max_model_len (131072) is greater than the derived max_model_len (None=8192 in model's config.json). This may lead to incorrect model outputs or CUDA errors. Make sure the value is correct and within the model > #3676

@pseudotensor

Description

@pseudotensor

Your current environment

Head of main after various cohere updates/fixes.

Issues:

sudo apt update
sudo apt install libnccl2 libnccl-dev

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sudo sh cuda_12.1.0_530.30.02_linux.run
sudo chmod -R a+rwx /usr/local/

export CUDA_HOME=/usr/local/cuda-12.1
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu121"

conda create -n vllm_cuda12.1 -y
conda activate vllm_cuda12.1
conda install python=3.10 -y

pip install git+https://github.com/vllm-project/vllm.git
pip install hf_transfer
pip install tiktoken accelerate flash_attn

export HF_HUB_ENABLE_HF_TRANSFER=1
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/lib64:$HOME/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin

export CUDA_VISIBLE_DEVICES="0,1"
python -m vllm.entrypoints.openai.api_server --port=5005 --host=0.0.0.0 --model CohereForAI/c4ai-command-r-v01 --seed 1234 --tensor-parallel-size=2 --max-num-batched-tokens=131072 --max-log-len=100  --max-model-len 131072
# --trust-remote-code

have to comment out trust-remote-code due to a bug in their model that has a PR for registration of the model name that isn't merged yet.

🐛 Describe the bug

INFO 03-28 08:14:18 api_server.py:147] vLLM API server version 0.3.3
INFO 03-28 08:14:18 api_server.py:148] args: Namespace(host='0.0.0.0', port=5005, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=Non>
Traceback (most recent call last):
  File "/home/fsuser/miniconda3/envs/vllm_cuda12.1/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/fsuser/miniconda3/envs/vllm_cuda12.1/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/fsuser/miniconda3/envs/vllm_cuda12.1/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 156, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/home/fsuser/miniconda3/envs/vllm_cuda12.1/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 327, in from_engine_args
    engine_configs = engine_args.create_engine_configs()
  File "/home/fsuser/miniconda3/envs/vllm_cuda12.1/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 362, in create_engine_configs
    model_config = ModelConfig(
  File "/home/fsuser/miniconda3/envs/vllm_cuda12.1/lib/python3.10/site-packages/vllm/config.py", line 124, in __init__
    self.max_model_len = _get_and_verify_max_len(self.hf_text_config,
  File "/home/fsuser/miniconda3/envs/vllm_cuda12.1/lib/python3.10/site-packages/vllm/config.py", line 791, in _get_and_verify_max_len
    raise ValueError(
ValueError: User-specified max_model_len (131072) is greater than the derived max_model_len (None=8192 in model's config.json). This may lead to incorrect model outputs or CUDA errors. Make sure the value is correct and within the model >

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions