-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Is your feature request related to a problem? Please describe.
Currently, Lightspeed Stack only officially supports cloud-based LLM providers (OpenAI, Azure, RHOAI, RHEL AI). This creates significant barriers for developers and users trying to:
- Get started with the project
- Test and develop locally
- Control costs during development
- Work offline or in air-gapped environments
- Learn the system without cloud dependencies
Current Situation
From docs/providers.md:
| Provider | Type | Supported in LCS |
|---|---|---|
| ollama | remote | ❌ |
| meta-reference | inline | ❌ |
| sentence-transformers | inline | ❌ |
Developers currently must:
- Create an OpenAI/Azure account
- Add payment information
- Manage API keys and quotas
- Pay for every test query during development
- Wait for quota resets when limits are hit
Example: A developer following the getting started guide encounters:
RateLimitError: You exceeded your current quota
This blocks them from testing basic functionality without adding credits.
Describe the solution you'd like
Add official support for Ollama as a local inference provider.
Ollama provides:
- Free, unlimited local inference (no API keys, no quotas)
- Easy installation (brew install ollama / curl -fsSL https://ollama.ai/install.sh | sh)
- Production-quality models (Llama 3, Mistral, Phi, etc.)
- OpenAI-compatible API (minimal integration effort)
- Active community and regular updates
Use Cases
- Getting Started Experience
New developers should be able to run:
#Install Ollama
brew install ollama
#Pull a model
ollama pull llama3.2
#Start Lightspeed Stack (no cloud setup needed!)
OLLAMA_MODEL=llama3.2 make run
- Development & Testing
- Run unit/integration tests without API costs
- Iterate quickly without rate limits
- Test RAG pipelines with local embeddings
- Develop offline
- CI/CD Pipelines
- Run E2E tests in GitHub Actions without managing secrets
- No quota concerns for parallel test runs
- Reproducible test environment
- Educational & Demo Purposes
- Workshop attendees don't need cloud accounts
- Demo the system without internet connectivity
- Training environments without budget concerns
- Privacy-Sensitive Development
- Test with sensitive data locally
- Develop features for air-gapped deployments
- Comply with data sovereignty requirements
Benefits
- For Developers:
- Zero cost for development
- Faster iteration (no network latency)
- No quota limits during testing
- Lower barrier to contribution
- For the Project:
- Increased adoption (easier onboarding)
- Better test coverage (developers can test more)
- Broader contributor base (global accessibility)
- Production flexibility (hybrid cloud/local deployments)
- For Users:
- Try before committing to cloud providers
- Development/staging environments without cloud costs
- Option for fully private deployments
Example Configuration
examples/ollama-run.yaml:
version: '2'
image_name: local-development-stack
providers:
inference:
- provider_id: ollama
provider_type: remote::ollama
config:
base_url: http://localhost:11434
models:
- model_id: llama3.2
provider_id: ollama
model_type: llm
provider_model_id: llama3.2
- model_id: nomic-embed-text
provider_id: ollama
model_type: embedding
provider_model_id: nomic-embed-text
Describe alternatives you've considered
- Continue cloud-only: Creates unnecessary barriers
- Use sentence-transformers only - Insufficient: Only supports embeddings, not chat
- Support meta-reference (Llama direct) - More complex setup than Ollama
- Support multiple local providers - Start with Ollama first, expand later
Additional context
-
Similar projects (LangChain, LlamaIndex) prominently feature local model support as a first-class option alongside cloud providers.
-
Ollama is already listed in llama-stack's provider list, just needs dependency installation and testing.
Looks like this will be a high-impact, medium-effort improvement that significantly reduces friction for new contributors and developers. The infrastructure is already in place via llama-stack; we primarily need:
- Dependency addition
- Example configurations
- Documentation updates
- Test validation