WIP: Add Ollama support #795

anik120 · 2025-11-15T16:44:50Z

First task to address #784

Description

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Integrated Ollama for local LLM inference with configuration templates for quick setup
- Improved resilience by gracefully handling unavailable safety services
Chores
- Added Ollama and HTTP library dependencies

First task to address lightspeed-core#784 Signed-off-by: Anik Bhattacharjee <[email protected]>

coderabbitai · 2025-11-15T16:44:58Z

Walkthrough

Adds Ollama integration support through new YAML configuration files for Lightspeed Stack and Llama Stack setup, introduces two dependencies (ollama and h11), and implements graceful error handling for unavailable safety APIs in the query endpoint by defaulting to empty shield lists on failure.

Changes

Cohort / File(s)	Summary
Configuration files for Ollama integration `examples/lightspeed-stack-ollama.yaml`, `examples/ollama-run.yaml`	New YAML configuration files defining complete Lightspeed Stack and Llama Stack setups for Ollama-based local LLM inference, including server parameters, authentication, providers, model configurations, and storage paths.
Dependency updates `pyproject.toml`	Added ollama>=0.4.7 and h11>=0.16.0 to the llslibdev dependency group.
Error handling for safety API `src/app/endpoints/query.py`	Wrapped shield availability discovery in try/except to handle cases when safety API is unavailable, defaulting to empty shield lists instead of raising errors.

Sequence Diagram

sequenceDiagram
    participant client as Client
    participant query as query.py
    participant safety as Safety API
    
    client->>query: retrieve_response()
    
    rect rgb(240, 248, 255)
    Note over query,safety: New error handling
    query->>safety: Fetch available shields
    
    alt Success
        safety-->>query: Return shields
        query->>query: Collect & log shields
    else Safety API unavailable
        safety--x query: ValueError/KeyError
        query->>query: Log warning
        query->>query: Set shields to empty list
    end
    end
    
    query-->>client: Response (shields enabled or disabled)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Verify YAML configuration files follow expected schema and reference correct paths (ollama-run.yaml referenced in lightspeed-stack-ollama.yaml)
Confirm error handling logic in query.py correctly catches all relevant exceptions from safety API
Validate that disabling shields gracefully is the intended behavior when safety API is unavailable

Suggested reviewers

tisnik
matysek

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'WIP: Add Ollama support' is vague and generic. While 'Ollama support' is mentioned in the changeset, the title lacks specificity about what aspect of Ollama integration is being added.	Replace with a more specific title describing the primary change, such as 'Add Ollama integration with example configurations' or 'Introduce Ollama-based local LLM inference support.'

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-11-15T16:45:00Z

Hi @anik120. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

examples/ollama-run.yaml (2)
16-47: Track the safety API limitation with an upstream issue.

The documentation clearly outlines a critical limitation: queries fail because the meta-reference agents provider requires the safety API, which has an OpenAI dependency. While the documentation is excellent and the workarounds are reasonable, this blocker should be tracked.

As noted in the documentation (line 47), would you like me to help draft an issue for the Llama Stack project to make the safety API optional in the agents provider? This would enable pure Ollama-only operation.

53-64: Consider documenting the workaround path.

The configuration enables the agents API (line 54) despite the known safety dependency issue. While this is necessary for Lightspeed's /v1/query endpoint, the configuration doesn't explicitly document which workaround is being used.

Consider adding a comment near line 54 to clarify the current status, such as:
 apis:
-  - agents  # Required by Lightspeed /v1/query endpoint (but has safety dependency - see below)
+  - agents  # Required by Lightspeed /v1/query endpoint
+            # NOTE: This now works with the graceful degradation added in query.py
+            # that defaults to empty shields when safety API is unavailable
This would help future maintainers understand why this works despite the documented limitation.
examples/lightspeed-stack-ollama.yaml (1)
46-47: Document that noop authentication is for development only.

The configuration uses noop authentication module, which provides no security. While appropriate for local development with Ollama, this should be clearly marked as unsafe for production.

Consider adding a security warning comment:
 authentication:
+  # WARNING: noop auth provides NO SECURITY - for development only!
+  # For production, use a proper authentication module
   module: "noop"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41e89f6 and 9938ee5.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

examples/lightspeed-stack-ollama.yaml (1 hunks)
examples/ollama-run.yaml (1 hunks)
pyproject.toml (1 hunks)
src/app/endpoints/query.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)
GitHub Check: e2e_tests (azure)

🔇 Additional comments (3)

examples/lightspeed-stack-ollama.yaml (2)

1-59: Excellent documentation and structure for Ollama integration.

The configuration file is well-organized with:

Clear quick start instructions

Deployment modes explained

Appropriate defaults for local development

Good integration with ollama-run.yaml

The example configuration provides a solid foundation for users to get started with Ollama integration.

Minor suggestion: Consider uncommenting the conversation cache section (lines 55-59) by default with the /tmp path, as caching can significantly improve the development experience.

28-32: Document working directory requirement for relative config paths.

The library_client_config_path: ollama-run.yaml uses a bare relative path that resolves from the current working directory when the Configuration is loaded. The validation in src/models/config.py (line 218) calls checks.file_check(), which uses os.path.isfile() and os.access() to resolve relative paths from CWD. If the server starts from a directory other than where ollama-run.yaml resides, this will fail.

Add documentation in the quick start guide (lines 6-12) to clarify that commands must run from the repository root, or update the example to use an absolute path or a path relative to a known location.

pyproject.toml (1)

161-163: Package versions verified and secure.

Both packages exist on PyPI and are properly specified:

ollama (>=0.4.7): No known vulnerabilities found.

h11 (>=0.16.0): A critical vulnerability exists in versions < 0.16.0, but the pinned requirement correctly specifies >=0.16.0, which excludes vulnerable versions and ensures only patched versions are used.

coderabbitai · 2025-11-15T16:49:39Z

examples/ollama-run.yaml

+  - model_id: llama3.2:latest
+    model_type: llm
+    provider_id: ollama
+    provider_model_id: llama3.2:latest


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify the Ollama model identifier format.

The model configuration uses llama3.2:latest as both model_id and provider_model_id. Ensure this matches Ollama's model naming conventions.

Additionally, verify that users can successfully pull this model:

🏁 Script executed:

#!/bin/bash # Description: Check if llama3.2:latest is a valid Ollama model name echo "Checking Ollama model availability..." echo "Users should verify with: ollama pull llama3.2:latest" echo "" echo "Checking if model naming follows Ollama conventions..." # This would need to be run on a system with Ollama installed

Length of output: 147

🌐 Web query:

What is the correct model identifier format for Ollama models in llama-stack-client?

💡 Result:

Use the Ollama provider prefix: ollama/.
Example: ollama/llama3.2:3b (provider_resource_id will be the Ollama name like llama3.2:3b-instruct-fp16). (llama-stack.readthedocs.io)

Citations:

1: https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/starter.html?utm_source=openai

Fix model_id to use Ollama provider prefix format.

The model_id should follow the documented llama-stack format with the ollama/ prefix. The correct format is ollama/<ollama-model-name> (e.g., ollama/llama3.2:3b).

Update line 227:

Change model_id: llama3.2:latest to model_id: ollama/llama3.2:latest

Keep provider_model_id: llama3.2:latest (this is the raw Ollama model name)

🤖 Prompt for AI Agents

In examples/ollama-run.yaml around lines 227 to 230, the model_id is missing the required Ollama provider prefix; change the model_id from "llama3.2:latest" to "ollama/llama3.2:latest" while leaving provider_model_id as "llama3.2:latest" so the file uses the documented llama-stack format (ollama/<ollama-model-name>) and retains the raw provider model name.

coderabbitai · 2025-11-15T16:49:39Z

src/app/endpoints/query.py

+    # Try to get available shields, but gracefully handle if safety API is not available
+    try:
+        available_input_shields = [
+            shield.identifier
+            for shield in filter(is_input_shield, await client.shields.list())
+        ]
+        available_output_shields = [
+            shield.identifier
+            for shield in filter(is_output_shield, await client.shields.list())
+        ]
+        if not available_input_shields and not available_output_shields:
+            logger.info("No available shields. Disabling safety")
+        else:
+            logger.info(
+                "Available input shields: %s, output shields: %s",
+                available_input_shields,
+                available_output_shields,
+            )
+    except (ValueError, KeyError) as e:
+        # Safety API not available (e.g., when using minimal Ollama configuration)
+        logger.info("Safety API not available, disabling shields: %s", e)
+        available_input_shields = []
+        available_output_shields = []


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the correct exceptions to catch for shield list failures.

The try/except block catches ValueError and KeyError for shield discovery failures. While this provides graceful degradation when the safety API is unavailable (important for Ollama), these exception types are quite broad and might inadvertently suppress other bugs.

Run the following script to check what exceptions the shields.list() method can raise:

Additionally, consider checking the llama-stack-client documentation for the expected exceptions:

If llama-stack-client defines a specific exception for API unavailability (e.g., APINotAvailableError), catching that would be more precise than catching ValueError and KeyError.

🏁 Script executed:

#!/bin/bash # Description: Find the shields.list() implementation and its exception handling # Search for shields.list implementation in llama-stack-client ast-grep --pattern $'async def list($$$): $$$' # Also search for shield-related exception handling rg -nP --type=py -C5 'shields\.list' -g '!test*' # Look for custom exceptions in llama_stack_client rg -nP --type=py 'class.*Exception|class.*Error' -g '*llama_stack*' -A3

Length of output: 3177

🌐 Web query:

What exceptions can llama_stack_client.shields.list() raise?

💡 Result:

client.shields.list() can raise the standard llama_stack_client API exceptions (same error model used across the SDK):

Base: APIError. [1]

Connection / network: APIConnectionError (includes timeouts). [2]

Non-success HTTP responses: APIStatusError and its status-specific subclasses: BadRequestError (400), AuthenticationError (401), PermissionDeniedError (403), NotFoundError (404), UnprocessableEntityError (422), RateLimitError (429), InternalServerError (>=500). [2]

(These are the documented exceptions the Python client surface uses for calls such as client.shields.list().) [1][2]

Sources:
[1] Python SDK reference — llama-stack (client.shields.list).
[2] llama-stack-client package error handling (PyPI / docs).

Catch the correct exceptions from llama-stack-client instead of generic Python exceptions.

Based on the llama-stack-client documentation, shields.list() raises APIError (base), APIConnectionError for connection/network issues, and APIStatusError with status-specific subclasses—not ValueError or KeyError. The current exception handling is too broad and will suppress legitimate API errors.

Reference the pattern already established in src/app/endpoints/shields.py (line 78), which correctly catches APIConnectionError. Update src/app/endpoints/query.py lines 691–712 to catch APIConnectionError (or APIError as the base class) instead of ValueError and KeyError.

Additionally, src/app/endpoints/streaming_query.py lines 1016–1023 calls shields.list() without any exception handling; add a try/except block there as well.

🤖 Prompt for AI Agents

In src/app/endpoints/query.py around lines 690 to 712, the exception handler currently catches ValueError and KeyError from client.shields.list() calls; replace those with the llama-stack-client exceptions (catch APIConnectionError for connection issues or APIError as the base class) so real API errors are not silently suppressed, and keep the same fallback behavior (log a clear message including the exception and set available_input_shields and available_output_shields to empty lists). Also update src/app/endpoints/streaming_query.py around lines 1016 to 1023 to wrap the shields.list() call in a similar try/except that catches APIConnectionError or APIError, logs the exception, and falls back to an empty shields list so the code behaves consistently when the Safety API is unavailable.

anik120 · 2025-11-15T17:08:20Z

/hold

Add Ollama support

9938ee5

First task to address lightspeed-core#784 Signed-off-by: Anik Bhattacharjee <[email protected]>

openshift-ci bot added the needs-ok-to-test label Nov 15, 2025

coderabbitai bot reviewed Nov 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Add Ollama support #795

WIP: Add Ollama support #795

Uh oh!

anik120 commented Nov 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 15, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Nov 15, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 15, 2025

Uh oh!

coderabbitai bot Nov 15, 2025

Uh oh!

anik120 commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

WIP: Add Ollama support #795

Are you sure you want to change the base?

WIP: Add Ollama support #795

Uh oh!

Conversation

anik120 commented Nov 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

openshift-ci bot commented Nov 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

anik120 commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anik120 commented Nov 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 15, 2025 •

edited

Loading