Skip to content

Conversation

akram
Copy link
Contributor

@akram akram commented Oct 3, 2025

  • Attempt model discovery first for backward compatibility
  • If discovery fails and refresh_models=false, continue without error
  • If discovery fails and refresh_models=true, fail hard with ValueError
  • Supports dynamic token authentication scenarios

Fixes authentication issues when vLLM service requires dynamic tokens

What does this PR do?

Implement graceful model discovery for vLLM provider.

Test Plan

LLAMA_STACK_LOGGING="all=debug" VLLM_URL=https://my-vllm-server:8443/v1  MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run

and use the following configuration:

providers:
  inference:
  - provider_id: ${env.VLLM_URL:+vllm}
    provider_type: remote::vllm
    config:
      url: ${env.VLLM_URL:=}
      max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
      api_token: ${env.VLLM_API_TOKEN:=fake}
      tls_verify: ${env.VLLM_TLS_VERIFY:=true}
      refresh_models: ${env.VLLM_REFRESH_MODELS:=false}

models:
- metadata:
    display_name: vllm
  model_id: vllm
  provider_id: vllm
  model_type: llm
      

start the server, you should see:

WARNING  2025-10-03 20:40:41,114 llama_stack.providers.remote.inference.vllm.vllm:443 inference::vllm: Model verification failed for model vllm/vllm
         with error <some error>
WARNING  2025-10-03 20:40:41,115 llama_stack.providers.remote.inference.vllm.vllm:444 inference::vllm: Continuing without live check
         (refresh_models=false).

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2025
@akram akram force-pushed the feature/vllm-graceful-model-discovery branch 2 times, most recently from f6c3e91 to 39ff06d Compare October 3, 2025 18:59
- Attempt model discovery first for backward compatibility
- If discovery fails and refresh_models=false, continue without error
- If discovery fails and refresh_models=true, fail hard with ValueError
- Supports dynamic token authentication scenarios

Fixes OAuth authentication issues when vLLM service requires dynamic tokens
@akram akram force-pushed the feature/vllm-graceful-model-discovery branch from 39ff06d to 2b54b57 Compare October 3, 2025 19:32
@leseb
Copy link
Collaborator

leseb commented Oct 6, 2025

Closing for #3677

@leseb leseb closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants