feat: implement graceful model discovery for vLLM provider #3673

akram · 2025-10-03T18:49:15Z

Attempt model discovery first for backward compatibility
If discovery fails and refresh_models=false, continue without error
If discovery fails and refresh_models=true, fail hard with ValueError
Supports dynamic token authentication scenarios

Fixes authentication issues when vLLM service requires dynamic tokens

What does this PR do?

Implement graceful model discovery for vLLM provider.

Test Plan

LLAMA_STACK_LOGGING="all=debug" VLLM_URL=https://my-vllm-server:8443/v1  MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run

and use the following configuration:

providers:
  inference:
  - provider_id: ${env.VLLM_URL:+vllm}
    provider_type: remote::vllm
    config:
      url: ${env.VLLM_URL:=}
      max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
      api_token: ${env.VLLM_API_TOKEN:=fake}
      tls_verify: ${env.VLLM_TLS_VERIFY:=true}
      refresh_models: ${env.VLLM_REFRESH_MODELS:=false}

models:
- metadata:
    display_name: vllm
  model_id: vllm
  provider_id: vllm
  model_type: llm

start the server, you should see:

WARNING  2025-10-03 20:40:41,114 llama_stack.providers.remote.inference.vllm.vllm:443 inference::vllm: Model verification failed for model vllm/vllm
         with error <some error>
WARNING  2025-10-03 20:40:41,115 llama_stack.providers.remote.inference.vllm.vllm:444 inference::vllm: Continuing without live check
         (refresh_models=false).

- Attempt model discovery first for backward compatibility - If discovery fails and refresh_models=false, continue without error - If discovery fails and refresh_models=true, fail hard with ValueError - Supports dynamic token authentication scenarios Fixes OAuth authentication issues when vLLM service requires dynamic tokens

leseb · 2025-10-06T08:24:37Z

Closing for #3677

akram requested review from ashwinb, yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb, bbrowning, reluctantfuturist, mattf and slekkala1 as code owners October 3, 2025 18:49

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2025

akram force-pushed the feature/vllm-graceful-model-discovery branch 2 times, most recently from f6c3e91 to 39ff06d Compare October 3, 2025 18:59

akram force-pushed the feature/vllm-graceful-model-discovery branch from 39ff06d to 2b54b57 Compare October 3, 2025 19:32

leseb closed this Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement graceful model discovery for vLLM provider #3673

feat: implement graceful model discovery for vLLM provider #3673

akram commented Oct 3, 2025

Uh oh!

leseb commented Oct 6, 2025

Uh oh!

Uh oh!

feat: implement graceful model discovery for vLLM provider #3673

feat: implement graceful model discovery for vLLM provider #3673

Conversation

akram commented Oct 3, 2025

What does this PR do?

Test Plan

Uh oh!

leseb commented Oct 6, 2025

Uh oh!

Uh oh!