Skip to content

Conversation

@anik120
Copy link
Contributor

@anik120 anik120 commented Nov 15, 2025

First task to address #784

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Integrated Ollama for local LLM inference with configuration templates for quick setup
    • Improved resilience by gracefully handling unavailable safety services
  • Chores

    • Added Ollama and HTTP library dependencies

First task to address lightspeed-core#784

Signed-off-by: Anik Bhattacharjee <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2025

Walkthrough

Adds Ollama integration support through new YAML configuration files for Lightspeed Stack and Llama Stack setup, introduces two dependencies (ollama and h11), and implements graceful error handling for unavailable safety APIs in the query endpoint by defaulting to empty shield lists on failure.

Changes

Cohort / File(s) Summary
Configuration files for Ollama integration
examples/lightspeed-stack-ollama.yaml, examples/ollama-run.yaml
New YAML configuration files defining complete Lightspeed Stack and Llama Stack setups for Ollama-based local LLM inference, including server parameters, authentication, providers, model configurations, and storage paths.
Dependency updates
pyproject.toml
Added ollama>=0.4.7 and h11>=0.16.0 to the llslibdev dependency group.
Error handling for safety API
src/app/endpoints/query.py
Wrapped shield availability discovery in try/except to handle cases when safety API is unavailable, defaulting to empty shield lists instead of raising errors.

Sequence Diagram

sequenceDiagram
    participant client as Client
    participant query as query.py
    participant safety as Safety API
    
    client->>query: retrieve_response()
    
    rect rgb(240, 248, 255)
    Note over query,safety: New error handling
    query->>safety: Fetch available shields
    
    alt Success
        safety-->>query: Return shields
        query->>query: Collect & log shields
    else Safety API unavailable
        safety--x query: ValueError/KeyError
        query->>query: Log warning
        query->>query: Set shields to empty list
    end
    end
    
    query-->>client: Response (shields enabled or disabled)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify YAML configuration files follow expected schema and reference correct paths (ollama-run.yaml referenced in lightspeed-stack-ollama.yaml)
  • Confirm error handling logic in query.py correctly catches all relevant exceptions from safety API
  • Validate that disabling shields gracefully is the intended behavior when safety API is unavailable

Suggested reviewers

  • tisnik
  • matysek

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'WIP: Add Ollama support' is vague and generic. While 'Ollama support' is mentioned in the changeset, the title lacks specificity about what aspect of Ollama integration is being added. Replace with a more specific title describing the primary change, such as 'Add Ollama integration with example configurations' or 'Introduce Ollama-based local LLM inference support.'
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Nov 15, 2025

Hi @anik120. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
examples/ollama-run.yaml (2)

16-47: Track the safety API limitation with an upstream issue.

The documentation clearly outlines a critical limitation: queries fail because the meta-reference agents provider requires the safety API, which has an OpenAI dependency. While the documentation is excellent and the workarounds are reasonable, this blocker should be tracked.

As noted in the documentation (line 47), would you like me to help draft an issue for the Llama Stack project to make the safety API optional in the agents provider? This would enable pure Ollama-only operation.


53-64: Consider documenting the workaround path.

The configuration enables the agents API (line 54) despite the known safety dependency issue. While this is necessary for Lightspeed's /v1/query endpoint, the configuration doesn't explicitly document which workaround is being used.

Consider adding a comment near line 54 to clarify the current status, such as:

 apis:
-  - agents  # Required by Lightspeed /v1/query endpoint (but has safety dependency - see below)
+  - agents  # Required by Lightspeed /v1/query endpoint
+            # NOTE: This now works with the graceful degradation added in query.py
+            # that defaults to empty shields when safety API is unavailable

This would help future maintainers understand why this works despite the documented limitation.

examples/lightspeed-stack-ollama.yaml (1)

46-47: Document that noop authentication is for development only.

The configuration uses noop authentication module, which provides no security. While appropriate for local development with Ollama, this should be clearly marked as unsafe for production.

Consider adding a security warning comment:

 authentication:
+  # WARNING: noop auth provides NO SECURITY - for development only!
+  # For production, use a proper authentication module
   module: "noop"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41e89f6 and 9938ee5.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • examples/lightspeed-stack-ollama.yaml (1 hunks)
  • examples/ollama-run.yaml (1 hunks)
  • pyproject.toml (1 hunks)
  • src/app/endpoints/query.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
🔇 Additional comments (3)
examples/lightspeed-stack-ollama.yaml (2)

1-59: Excellent documentation and structure for Ollama integration.

The configuration file is well-organized with:

  • Clear quick start instructions
  • Deployment modes explained
  • Appropriate defaults for local development
  • Good integration with ollama-run.yaml

The example configuration provides a solid foundation for users to get started with Ollama integration.

Minor suggestion: Consider uncommenting the conversation cache section (lines 55-59) by default with the /tmp path, as caching can significantly improve the development experience.


28-32: Document working directory requirement for relative config paths.

The library_client_config_path: ollama-run.yaml uses a bare relative path that resolves from the current working directory when the Configuration is loaded. The validation in src/models/config.py (line 218) calls checks.file_check(), which uses os.path.isfile() and os.access() to resolve relative paths from CWD. If the server starts from a directory other than where ollama-run.yaml resides, this will fail.

Add documentation in the quick start guide (lines 6-12) to clarify that commands must run from the repository root, or update the example to use an absolute path or a path relative to a known location.

pyproject.toml (1)

161-163: Package versions verified and secure.

Both packages exist on PyPI and are properly specified:

  • ollama (>=0.4.7): No known vulnerabilities found.
  • h11 (>=0.16.0): A critical vulnerability exists in versions < 0.16.0, but the pinned requirement correctly specifies >=0.16.0, which excludes vulnerable versions and ensures only patched versions are used.

Comment on lines +227 to +230
- model_id: llama3.2:latest
model_type: llm
provider_id: ollama
provider_model_id: llama3.2:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify the Ollama model identifier format.

The model configuration uses llama3.2:latest as both model_id and provider_model_id. Ensure this matches Ollama's model naming conventions.

Additionally, verify that users can successfully pull this model:


🏁 Script executed:

#!/bin/bash
# Description: Check if llama3.2:latest is a valid Ollama model name

echo "Checking Ollama model availability..."
echo "Users should verify with: ollama pull llama3.2:latest"
echo ""
echo "Checking if model naming follows Ollama conventions..."
# This would need to be run on a system with Ollama installed

Length of output: 147


🌐 Web query:

What is the correct model identifier format for Ollama models in llama-stack-client?

💡 Result:

Use the Ollama provider prefix: ollama/.
Example: ollama/llama3.2:3b (provider_resource_id will be the Ollama name like llama3.2:3b-instruct-fp16). (llama-stack.readthedocs.io)

Citations:


Fix model_id to use Ollama provider prefix format.

The model_id should follow the documented llama-stack format with the ollama/ prefix. The correct format is ollama/<ollama-model-name> (e.g., ollama/llama3.2:3b).

Update line 227:

  • Change model_id: llama3.2:latest to model_id: ollama/llama3.2:latest
  • Keep provider_model_id: llama3.2:latest (this is the raw Ollama model name)
🤖 Prompt for AI Agents
In examples/ollama-run.yaml around lines 227 to 230, the model_id is missing the
required Ollama provider prefix; change the model_id from "llama3.2:latest" to
"ollama/llama3.2:latest" while leaving provider_model_id as "llama3.2:latest" so
the file uses the documented llama-stack format (ollama/<ollama-model-name>) and
retains the raw provider model name.

Comment on lines +690 to +712
# Try to get available shields, but gracefully handle if safety API is not available
try:
available_input_shields = [
shield.identifier
for shield in filter(is_input_shield, await client.shields.list())
]
available_output_shields = [
shield.identifier
for shield in filter(is_output_shield, await client.shields.list())
]
if not available_input_shields and not available_output_shields:
logger.info("No available shields. Disabling safety")
else:
logger.info(
"Available input shields: %s, output shields: %s",
available_input_shields,
available_output_shields,
)
except (ValueError, KeyError) as e:
# Safety API not available (e.g., when using minimal Ollama configuration)
logger.info("Safety API not available, disabling shields: %s", e)
available_input_shields = []
available_output_shields = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the correct exceptions to catch for shield list failures.

The try/except block catches ValueError and KeyError for shield discovery failures. While this provides graceful degradation when the safety API is unavailable (important for Ollama), these exception types are quite broad and might inadvertently suppress other bugs.

Run the following script to check what exceptions the shields.list() method can raise:

Additionally, consider checking the llama-stack-client documentation for the expected exceptions:

If llama-stack-client defines a specific exception for API unavailability (e.g., APINotAvailableError), catching that would be more precise than catching ValueError and KeyError.


🏁 Script executed:

#!/bin/bash
# Description: Find the shields.list() implementation and its exception handling

# Search for shields.list implementation in llama-stack-client
ast-grep --pattern $'async def list($$$):
  $$$'

# Also search for shield-related exception handling
rg -nP --type=py -C5 'shields\.list\(\)' -g '!test*'

# Look for custom exceptions in llama_stack_client
rg -nP --type=py 'class.*Exception|class.*Error' -g '*llama_stack*' -A3

Length of output: 3177


🌐 Web query:

What exceptions can llama_stack_client.shields.list() raise?

💡 Result:

client.shields.list() can raise the standard llama_stack_client API exceptions (same error model used across the SDK):

  • Base: APIError. [1]
  • Connection / network: APIConnectionError (includes timeouts). [2]
  • Non-success HTTP responses: APIStatusError and its status-specific subclasses: BadRequestError (400), AuthenticationError (401), PermissionDeniedError (403), NotFoundError (404), UnprocessableEntityError (422), RateLimitError (429), InternalServerError (>=500). [2]

(These are the documented exceptions the Python client surface uses for calls such as client.shields.list().) [1][2]

Sources:
[1] Python SDK reference — llama-stack (client.shields.list).
[2] llama-stack-client package error handling (PyPI / docs).


Catch the correct exceptions from llama-stack-client instead of generic Python exceptions.

Based on the llama-stack-client documentation, shields.list() raises APIError (base), APIConnectionError for connection/network issues, and APIStatusError with status-specific subclasses—not ValueError or KeyError. The current exception handling is too broad and will suppress legitimate API errors.

Reference the pattern already established in src/app/endpoints/shields.py (line 78), which correctly catches APIConnectionError. Update src/app/endpoints/query.py lines 691–712 to catch APIConnectionError (or APIError as the base class) instead of ValueError and KeyError.

Additionally, src/app/endpoints/streaming_query.py lines 1016–1023 calls shields.list() without any exception handling; add a try/except block there as well.

🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 690 to 712, the exception handler
currently catches ValueError and KeyError from client.shields.list() calls;
replace those with the llama-stack-client exceptions (catch APIConnectionError
for connection issues or APIError as the base class) so real API errors are not
silently suppressed, and keep the same fallback behavior (log a clear message
including the exception and set available_input_shields and
available_output_shields to empty lists). Also update
src/app/endpoints/streaming_query.py around lines 1016 to 1023 to wrap the
shields.list() call in a similar try/except that catches APIConnectionError or
APIError, logs the exception, and falls back to an empty shields list so the
code behaves consistently when the Safety API is unavailable.

@anik120
Copy link
Contributor Author

anik120 commented Nov 15, 2025

/hold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant