Skip to content

[RFE] Enable local LLM provider support (Ollama) for development and testing #784

@anik120

Description

@anik120

Is your feature request related to a problem? Please describe.

Currently, Lightspeed Stack only officially supports cloud-based LLM providers (OpenAI, Azure, RHOAI, RHEL AI). This creates significant barriers for developers and users trying to:

  • Get started with the project
  • Test and develop locally
  • Control costs during development
  • Work offline or in air-gapped environments
  • Learn the system without cloud dependencies

Current Situation

From docs/providers.md:

Provider Type Supported in LCS
ollama remote
meta-reference inline
sentence-transformers inline

Developers currently must:

  1. Create an OpenAI/Azure account
  2. Add payment information
  3. Manage API keys and quotas
  4. Pay for every test query during development
  5. Wait for quota resets when limits are hit

Example: A developer following the getting started guide encounters:

RateLimitError: You exceeded your current quota

This blocks them from testing basic functionality without adding credits.

Describe the solution you'd like

Add official support for Ollama as a local inference provider.

Ollama provides:

  • Free, unlimited local inference (no API keys, no quotas)
  • Easy installation (brew install ollama / curl -fsSL https://ollama.ai/install.sh | sh)
  • Production-quality models (Llama 3, Mistral, Phi, etc.)
  • OpenAI-compatible API (minimal integration effort)
  • Active community and regular updates

Use Cases

  1. Getting Started Experience

New developers should be able to run:

#Install Ollama
brew install ollama

#Pull a model
ollama pull llama3.2

#Start Lightspeed Stack (no cloud setup needed!)
OLLAMA_MODEL=llama3.2 make run
  1. Development & Testing
  • Run unit/integration tests without API costs
  • Iterate quickly without rate limits
  • Test RAG pipelines with local embeddings
  • Develop offline
  1. CI/CD Pipelines
  • Run E2E tests in GitHub Actions without managing secrets
  • No quota concerns for parallel test runs
  • Reproducible test environment
  1. Educational & Demo Purposes
  • Workshop attendees don't need cloud accounts
  • Demo the system without internet connectivity
  • Training environments without budget concerns
  1. Privacy-Sensitive Development
  • Test with sensitive data locally
  • Develop features for air-gapped deployments
  • Comply with data sovereignty requirements

Benefits

  1. For Developers:
  • Zero cost for development
  • Faster iteration (no network latency)
  • No quota limits during testing
  • Lower barrier to contribution
  1. For the Project:
  • Increased adoption (easier onboarding)
  • Better test coverage (developers can test more)
  • Broader contributor base (global accessibility)
  • Production flexibility (hybrid cloud/local deployments)
  1. For Users:
  • Try before committing to cloud providers
  • Development/staging environments without cloud costs
  • Option for fully private deployments

Example Configuration

examples/ollama-run.yaml:

  version: '2'
  image_name: local-development-stack

  providers:
    inference:
      - provider_id: ollama
        provider_type: remote::ollama
        config:
          base_url: http://localhost:11434

  models:
    - model_id: llama3.2
      provider_id: ollama
      model_type: llm
      provider_model_id: llama3.2

    - model_id: nomic-embed-text
      provider_id: ollama
      model_type: embedding
      provider_model_id: nomic-embed-text

Describe alternatives you've considered

  1. Continue cloud-only: Creates unnecessary barriers
  2. Use sentence-transformers only - Insufficient: Only supports embeddings, not chat
  3. Support meta-reference (Llama direct) - More complex setup than Ollama
  4. Support multiple local providers - Start with Ollama first, expand later

Additional context

  • Similar projects (LangChain, LlamaIndex) prominently feature local model support as a first-class option alongside cloud providers.

  • Ollama is already listed in llama-stack's provider list, just needs dependency installation and testing.

Looks like this will be a high-impact, medium-effort improvement that significantly reduces friction for new contributors and developers. The infrastructure is already in place via llama-stack; we primarily need:

  • Dependency addition
  • Example configurations
  • Documentation updates
  • Test validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions