Skip to content

Commit 9938ee5

Browse files
committed
Add Ollama support
First task to address #784 Signed-off-by: Anik Bhattacharjee <[email protected]>
1 parent 41e89f6 commit 9938ee5

File tree

5 files changed

+348
-16
lines changed

5 files changed

+348
-16
lines changed
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Lightspeed Stack Configuration for Ollama
2+
#
3+
# This configuration file sets up Lightspeed Stack to use Ollama for local LLM inference.
4+
# Works in conjunction with examples/ollama-run.yaml for Llama Stack configuration.
5+
#
6+
# Quick Start:
7+
# 1. Install dependencies: uv sync --group llslibdev
8+
# 2. Install Ollama: https://ollama.com
9+
# 3. Pull a model: ollama pull llama3.2:latest
10+
# 4. Copy configs: cp examples/ollama-run.yaml run.yaml
11+
# cp examples/lightspeed-stack-ollama.yaml lightspeed-stack.yaml
12+
# 5. Start server: make run
13+
#
14+
# Deployment Modes:
15+
# - Library mode (default): Llama Stack runs embedded in Lightspeed process
16+
# - Remote mode: Llama Stack runs as separate service (requires manual start)
17+
#
18+
19+
name: Lightspeed Core Service (LCS) with Ollama
20+
service:
21+
host: 0.0.0.0
22+
port: 8080
23+
auth_enabled: false
24+
workers: 1
25+
color_log: true
26+
access_log: true
27+
28+
llama_stack:
29+
# Use Llama Stack as embedded library (single process mode)
30+
# This starts both Lightspeed Stack and Llama Stack in one process
31+
use_as_library_client: true
32+
library_client_config_path: ollama-run.yaml
33+
34+
# Alternative: Use Llama Stack as separate service (uncomment below and comment above)
35+
# This requires running "uv run llama stack run examples/ollama-run.yaml" separately
36+
# use_as_library_client: false
37+
# url: http://localhost:8321
38+
# api_key: xyzzy
39+
40+
user_data_collection:
41+
feedback_enabled: true
42+
feedback_storage: "/tmp/data/feedback"
43+
transcripts_enabled: true
44+
transcripts_storage: "/tmp/data/transcripts"
45+
46+
authentication:
47+
module: "noop"
48+
49+
inference:
50+
# Default to the fastest local model
51+
# Note: Ensure this model is pulled via: ollama pull llama3.2:latest
52+
default_model: "llama3.2:latest"
53+
default_provider: "ollama"
54+
55+
# Optional: Configure conversation cache for better performance
56+
# conversation_cache:
57+
# type: "sqlite"
58+
# sqlite:
59+
# db_path: "/tmp/lightspeed-ollama-cache.db"

examples/ollama-run.yaml

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Llama Stack Configuration for Ollama Integration
2+
#
3+
# This configuration enables Lightspeed Stack to use Ollama for local LLM inference.
4+
# Ollama allows running models locally without requiring cloud API keys or internet connectivity.
5+
#
6+
# Prerequisites:
7+
# 1. Install Ollama: https://ollama.com
8+
# 2. Pull at least one model: ollama pull llama3.2:latest
9+
# 3. Ensure Ollama is running: ollama serve (or run Ollama app)
10+
#
11+
# Usage:
12+
# cp examples/ollama-run.yaml run.yaml
13+
# cp examples/lightspeed-stack-ollama.yaml lightspeed-stack.yaml
14+
# make run
15+
#
16+
# ⚠️ KNOWN LIMITATION - AGENTS PROVIDER REQUIRES SAFETY API ⚠️
17+
#
18+
# Current Status: SERVER STARTS ✓ but QUERIES FAIL ✗
19+
#
20+
# The meta-reference agents provider in Llama Stack has a hard dependency on the
21+
# safety API. However, the safety API (llama-guard) appears to require an OpenAI
22+
# provider, creating a circular dependency that prevents pure Ollama-only operation.
23+
#
24+
# Configuration State:
25+
# - agents API: ENABLED (required by Lightspeed /v1/query endpoint)
26+
# - safety API: DISABLED (has OpenAI dependency)
27+
# - Result: Server starts but agents provider cannot initialize without safety
28+
#
29+
# What Actually Works:
30+
# ✓ Server startup and readiness checks pass
31+
# ✓ Ollama provider loads and connects to localhost:11434
32+
# ✓ Embedding models via sentence-transformers
33+
# ✓ Vector storage with FAISS
34+
# ✓ Health monitoring endpoints
35+
#
36+
# What's Blocked:
37+
# ✗ /v1/query endpoint (returns 500 - agents needs safety)
38+
# ✗ /v1/query_v2 endpoint (same issue)
39+
# ✗ Streaming query endpoints (same issue)
40+
# ✗ Shield-based content moderation
41+
#
42+
# Workarounds:
43+
# 1. Add minimal OpenAI config just for safety (hybrid approach)
44+
# 2. Use direct /v1/inference/chat-completion endpoint (if available)
45+
# 3. Wait for Llama Stack fix to make safety optional in agents provider
46+
#
47+
# An issue will be filed with the Llama Stack project to address this dependency.
48+
#
49+
50+
version: '2'
51+
image_name: ollama-llama-stack-configuration
52+
53+
apis:
54+
- agents # Required by Lightspeed /v1/query endpoint (but has safety dependency - see below)
55+
- datasetio
56+
- eval
57+
- files
58+
- inference # Required - Ollama provider configured here
59+
- post_training
60+
# - safety # DISABLED: llama-guard has OpenAI dependency, blocking agents from working
61+
- scoring
62+
- telemetry
63+
- tool_runtime
64+
- vector_io
65+
66+
benchmarks: []
67+
container_image: null
68+
datasets: []
69+
external_providers_dir: null
70+
71+
inference_store:
72+
db_path: .llama/distributions/ollama/inference_store.db
73+
type: sqlite
74+
75+
logging: null
76+
77+
metadata_store:
78+
db_path: .llama/distributions/ollama/registry.db
79+
namespace: null
80+
type: sqlite
81+
82+
providers:
83+
files:
84+
- provider_id: localfs
85+
provider_type: inline::localfs
86+
config:
87+
storage_dir: /tmp/llama-stack-files
88+
metadata_store:
89+
type: sqlite
90+
db_path: .llama/distributions/ollama/files_metadata.db
91+
92+
agents:
93+
- provider_id: meta-reference
94+
provider_type: inline::meta-reference
95+
config:
96+
persistence_store:
97+
db_path: .llama/distributions/ollama/agents_store.db
98+
namespace: null
99+
type: sqlite
100+
responses_store:
101+
db_path: .llama/distributions/ollama/responses_store.db
102+
type: sqlite
103+
104+
datasetio:
105+
- provider_id: huggingface
106+
provider_type: remote::huggingface
107+
config:
108+
kvstore:
109+
db_path: .llama/distributions/ollama/huggingface_datasetio.db
110+
namespace: null
111+
type: sqlite
112+
- provider_id: localfs
113+
provider_type: inline::localfs
114+
config:
115+
kvstore:
116+
db_path: .llama/distributions/ollama/localfs_datasetio.db
117+
namespace: null
118+
type: sqlite
119+
120+
eval:
121+
- provider_id: meta-reference
122+
provider_type: inline::meta-reference
123+
config:
124+
kvstore:
125+
db_path: .llama/distributions/ollama/meta_reference_eval.db
126+
namespace: null
127+
type: sqlite
128+
129+
inference:
130+
# Embedding model for RAG - use sentence-transformers
131+
- provider_id: sentence-transformers
132+
provider_type: inline::sentence-transformers
133+
config: {}
134+
# Local LLM inference via Ollama
135+
- provider_id: ollama
136+
provider_type: remote::ollama
137+
config:
138+
url: http://localhost:11434 # Default Ollama port
139+
140+
post_training:
141+
- provider_id: huggingface
142+
provider_type: inline::huggingface-gpu
143+
config:
144+
checkpoint_format: huggingface
145+
device: cpu
146+
distributed_backend: null
147+
dpo_output_dir: "."
148+
149+
# safety:
150+
# - provider_id: llama-guard
151+
# provider_type: inline::llama-guard
152+
# config:
153+
# excluded_categories: []
154+
155+
scoring:
156+
- provider_id: basic
157+
provider_type: inline::basic
158+
config: {}
159+
# Disabled: These providers require OpenAI
160+
# - provider_id: llm-as-judge
161+
# provider_type: inline::llm-as-judge
162+
# config: {}
163+
# - provider_id: braintrust
164+
# provider_type: inline::braintrust
165+
# config:
166+
# openai_api_key: '********'
167+
168+
telemetry:
169+
- provider_id: meta-reference
170+
provider_type: inline::meta-reference
171+
config:
172+
service_name: 'lightspeed-stack-ollama'
173+
sinks: sqlite
174+
sqlite_db_path: .llama/distributions/ollama/trace_store.db
175+
176+
tool_runtime:
177+
- provider_id: model-context-protocol
178+
provider_type: remote::model-context-protocol
179+
config: {}
180+
- provider_id: rag-runtime
181+
provider_type: inline::rag-runtime
182+
config: {}
183+
184+
vector_io:
185+
- provider_id: faiss
186+
provider_type: inline::faiss
187+
config:
188+
kvstore:
189+
db_path: .llama/distributions/ollama/faiss_store.db
190+
namespace: null
191+
type: sqlite
192+
193+
scoring_fns: []
194+
195+
server:
196+
auth: null
197+
host: null
198+
port: 8321
199+
quota: null
200+
tls_cafile: null
201+
tls_certfile: null
202+
tls_keyfile: null
203+
204+
shields: []
205+
# Disabled - llama-guard requires specific Llama Guard models
206+
# - shield_id: llama-guard-shield
207+
# provider_id: llama-guard
208+
# provider_shield_id: "llama3.2:latest"
209+
210+
vector_dbs:
211+
- vector_db_id: my_knowledge_base
212+
embedding_model: sentence-transformers/all-mpnet-base-v2
213+
embedding_dimension: 768
214+
provider_id: faiss
215+
216+
models:
217+
# Embedding model for RAG
218+
- model_id: sentence-transformers/all-mpnet-base-v2
219+
model_type: embedding
220+
provider_id: sentence-transformers
221+
provider_model_id: sentence-transformers/all-mpnet-base-v2
222+
metadata:
223+
embedding_dimension: 768
224+
225+
# Local Ollama models (users must pull these first with: ollama pull <model>)
226+
# Fast, small model - great for development
227+
- model_id: llama3.2:latest
228+
model_type: llm
229+
provider_id: ollama
230+
provider_model_id: llama3.2:latest
231+
232+
# To add more models, first pull them with: ollama pull <model>
233+
# Then uncomment and configure:
234+
# - model_id: qwen2.5:7b
235+
# model_type: llm
236+
# provider_id: ollama
237+
# provider_model_id: qwen2.5:7b
238+
#
239+
# - model_id: llama3.1:8b
240+
# model_type: llm
241+
# provider_id: ollama
242+
# provider_model_id: llama3.1:8b
243+
244+
tool_groups:
245+
- toolgroup_id: builtin::rag
246+
provider_id: rag-runtime

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,9 @@ llslibdev = [
158158
"opentelemetry-instrumentation>=0.55b0",
159159
"blobfile>=3.0.0",
160160
"psutil>=7.0.0",
161+
# API inference: remote::ollama
162+
"ollama>=0.4.7",
163+
"h11>=0.16.0",
161164
]
162165

163166
build = [

src/app/endpoints/query.py

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -687,22 +687,29 @@ async def retrieve_response( # pylint: disable=too-many-locals,too-many-branche
687687
a summary of the LLM or agent's response
688688
content, the conversation ID, the list of parsed referenced documents, and token usage information.
689689
"""
690-
available_input_shields = [
691-
shield.identifier
692-
for shield in filter(is_input_shield, await client.shields.list())
693-
]
694-
available_output_shields = [
695-
shield.identifier
696-
for shield in filter(is_output_shield, await client.shields.list())
697-
]
698-
if not available_input_shields and not available_output_shields:
699-
logger.info("No available shields. Disabling safety")
700-
else:
701-
logger.info(
702-
"Available input shields: %s, output shields: %s",
703-
available_input_shields,
704-
available_output_shields,
705-
)
690+
# Try to get available shields, but gracefully handle if safety API is not available
691+
try:
692+
available_input_shields = [
693+
shield.identifier
694+
for shield in filter(is_input_shield, await client.shields.list())
695+
]
696+
available_output_shields = [
697+
shield.identifier
698+
for shield in filter(is_output_shield, await client.shields.list())
699+
]
700+
if not available_input_shields and not available_output_shields:
701+
logger.info("No available shields. Disabling safety")
702+
else:
703+
logger.info(
704+
"Available input shields: %s, output shields: %s",
705+
available_input_shields,
706+
available_output_shields,
707+
)
708+
except (ValueError, KeyError) as e:
709+
# Safety API not available (e.g., when using minimal Ollama configuration)
710+
logger.info("Safety API not available, disabling shields: %s", e)
711+
available_input_shields = []
712+
available_output_shields = []
706713
# use system prompt from request or default one
707714
system_prompt = get_system_prompt(query_request, configuration)
708715
logger.debug("Using system prompt: %s", system_prompt)

0 commit comments

Comments
 (0)