-
Notifications
You must be signed in to change notification settings - Fork 55
WIP: Add Ollama support #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
First task to address lightspeed-core#784 Signed-off-by: Anik Bhattacharjee <[email protected]>
WalkthroughAdds Ollama integration support through new YAML configuration files for Lightspeed Stack and Llama Stack setup, introduces two dependencies (ollama and h11), and implements graceful error handling for unavailable safety APIs in the query endpoint by defaulting to empty shield lists on failure. Changes
Sequence DiagramsequenceDiagram
participant client as Client
participant query as query.py
participant safety as Safety API
client->>query: retrieve_response()
rect rgb(240, 248, 255)
Note over query,safety: New error handling
query->>safety: Fetch available shields
alt Success
safety-->>query: Return shields
query->>query: Collect & log shields
else Safety API unavailable
safety--x query: ValueError/KeyError
query->>query: Log warning
query->>query: Set shields to empty list
end
end
query-->>client: Response (shields enabled or disabled)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @anik120. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
examples/ollama-run.yaml (2)
16-47: Track the safety API limitation with an upstream issue.The documentation clearly outlines a critical limitation: queries fail because the meta-reference agents provider requires the safety API, which has an OpenAI dependency. While the documentation is excellent and the workarounds are reasonable, this blocker should be tracked.
As noted in the documentation (line 47), would you like me to help draft an issue for the Llama Stack project to make the safety API optional in the agents provider? This would enable pure Ollama-only operation.
53-64: Consider documenting the workaround path.The configuration enables the agents API (line 54) despite the known safety dependency issue. While this is necessary for Lightspeed's
/v1/queryendpoint, the configuration doesn't explicitly document which workaround is being used.Consider adding a comment near line 54 to clarify the current status, such as:
apis: - - agents # Required by Lightspeed /v1/query endpoint (but has safety dependency - see below) + - agents # Required by Lightspeed /v1/query endpoint + # NOTE: This now works with the graceful degradation added in query.py + # that defaults to empty shields when safety API is unavailableThis would help future maintainers understand why this works despite the documented limitation.
examples/lightspeed-stack-ollama.yaml (1)
46-47: Document that noop authentication is for development only.The configuration uses
noopauthentication module, which provides no security. While appropriate for local development with Ollama, this should be clearly marked as unsafe for production.Consider adding a security warning comment:
authentication: + # WARNING: noop auth provides NO SECURITY - for development only! + # For production, use a proper authentication module module: "noop"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
examples/lightspeed-stack-ollama.yaml(1 hunks)examples/ollama-run.yaml(1 hunks)pyproject.toml(1 hunks)src/app/endpoints/query.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (3)
examples/lightspeed-stack-ollama.yaml (2)
1-59: Excellent documentation and structure for Ollama integration.The configuration file is well-organized with:
- Clear quick start instructions
- Deployment modes explained
- Appropriate defaults for local development
- Good integration with ollama-run.yaml
The example configuration provides a solid foundation for users to get started with Ollama integration.
Minor suggestion: Consider uncommenting the conversation cache section (lines 55-59) by default with the
/tmppath, as caching can significantly improve the development experience.
28-32: Document working directory requirement for relative config paths.The
library_client_config_path: ollama-run.yamluses a bare relative path that resolves from the current working directory when the Configuration is loaded. The validation insrc/models/config.py(line 218) callschecks.file_check(), which usesos.path.isfile()andos.access()to resolve relative paths from CWD. If the server starts from a directory other than whereollama-run.yamlresides, this will fail.Add documentation in the quick start guide (lines 6-12) to clarify that commands must run from the repository root, or update the example to use an absolute path or a path relative to a known location.
pyproject.toml (1)
161-163: Package versions verified and secure.Both packages exist on PyPI and are properly specified:
- ollama (>=0.4.7): No known vulnerabilities found.
- h11 (>=0.16.0): A critical vulnerability exists in versions < 0.16.0, but the pinned requirement correctly specifies >=0.16.0, which excludes vulnerable versions and ensures only patched versions are used.
| - model_id: llama3.2:latest | ||
| model_type: llm | ||
| provider_id: ollama | ||
| provider_model_id: llama3.2:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify the Ollama model identifier format.
The model configuration uses llama3.2:latest as both model_id and provider_model_id. Ensure this matches Ollama's model naming conventions.
Additionally, verify that users can successfully pull this model:
🏁 Script executed:
#!/bin/bash
# Description: Check if llama3.2:latest is a valid Ollama model name
echo "Checking Ollama model availability..."
echo "Users should verify with: ollama pull llama3.2:latest"
echo ""
echo "Checking if model naming follows Ollama conventions..."
# This would need to be run on a system with Ollama installedLength of output: 147
🌐 Web query:
What is the correct model identifier format for Ollama models in llama-stack-client?
💡 Result:
Use the Ollama provider prefix: ollama/.
Example: ollama/llama3.2:3b (provider_resource_id will be the Ollama name like llama3.2:3b-instruct-fp16). (llama-stack.readthedocs.io)
Citations:
Fix model_id to use Ollama provider prefix format.
The model_id should follow the documented llama-stack format with the ollama/ prefix. The correct format is ollama/<ollama-model-name> (e.g., ollama/llama3.2:3b).
Update line 227:
- Change
model_id: llama3.2:latesttomodel_id: ollama/llama3.2:latest - Keep
provider_model_id: llama3.2:latest(this is the raw Ollama model name)
🤖 Prompt for AI Agents
In examples/ollama-run.yaml around lines 227 to 230, the model_id is missing the
required Ollama provider prefix; change the model_id from "llama3.2:latest" to
"ollama/llama3.2:latest" while leaving provider_model_id as "llama3.2:latest" so
the file uses the documented llama-stack format (ollama/<ollama-model-name>) and
retains the raw provider model name.
| # Try to get available shields, but gracefully handle if safety API is not available | ||
| try: | ||
| available_input_shields = [ | ||
| shield.identifier | ||
| for shield in filter(is_input_shield, await client.shields.list()) | ||
| ] | ||
| available_output_shields = [ | ||
| shield.identifier | ||
| for shield in filter(is_output_shield, await client.shields.list()) | ||
| ] | ||
| if not available_input_shields and not available_output_shields: | ||
| logger.info("No available shields. Disabling safety") | ||
| else: | ||
| logger.info( | ||
| "Available input shields: %s, output shields: %s", | ||
| available_input_shields, | ||
| available_output_shields, | ||
| ) | ||
| except (ValueError, KeyError) as e: | ||
| # Safety API not available (e.g., when using minimal Ollama configuration) | ||
| logger.info("Safety API not available, disabling shields: %s", e) | ||
| available_input_shields = [] | ||
| available_output_shields = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify the correct exceptions to catch for shield list failures.
The try/except block catches ValueError and KeyError for shield discovery failures. While this provides graceful degradation when the safety API is unavailable (important for Ollama), these exception types are quite broad and might inadvertently suppress other bugs.
Run the following script to check what exceptions the shields.list() method can raise:
Additionally, consider checking the llama-stack-client documentation for the expected exceptions:
If llama-stack-client defines a specific exception for API unavailability (e.g., APINotAvailableError), catching that would be more precise than catching ValueError and KeyError.
🏁 Script executed:
#!/bin/bash
# Description: Find the shields.list() implementation and its exception handling
# Search for shields.list implementation in llama-stack-client
ast-grep --pattern $'async def list($$$):
$$$'
# Also search for shield-related exception handling
rg -nP --type=py -C5 'shields\.list\(\)' -g '!test*'
# Look for custom exceptions in llama_stack_client
rg -nP --type=py 'class.*Exception|class.*Error' -g '*llama_stack*' -A3Length of output: 3177
🌐 Web query:
What exceptions can llama_stack_client.shields.list() raise?
💡 Result:
client.shields.list() can raise the standard llama_stack_client API exceptions (same error model used across the SDK):
- Base: APIError. [1]
- Connection / network: APIConnectionError (includes timeouts). [2]
- Non-success HTTP responses: APIStatusError and its status-specific subclasses: BadRequestError (400), AuthenticationError (401), PermissionDeniedError (403), NotFoundError (404), UnprocessableEntityError (422), RateLimitError (429), InternalServerError (>=500). [2]
(These are the documented exceptions the Python client surface uses for calls such as client.shields.list().) [1][2]
Sources:
[1] Python SDK reference — llama-stack (client.shields.list).
[2] llama-stack-client package error handling (PyPI / docs).
Catch the correct exceptions from llama-stack-client instead of generic Python exceptions.
Based on the llama-stack-client documentation, shields.list() raises APIError (base), APIConnectionError for connection/network issues, and APIStatusError with status-specific subclasses—not ValueError or KeyError. The current exception handling is too broad and will suppress legitimate API errors.
Reference the pattern already established in src/app/endpoints/shields.py (line 78), which correctly catches APIConnectionError. Update src/app/endpoints/query.py lines 691–712 to catch APIConnectionError (or APIError as the base class) instead of ValueError and KeyError.
Additionally, src/app/endpoints/streaming_query.py lines 1016–1023 calls shields.list() without any exception handling; add a try/except block there as well.
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 690 to 712, the exception handler
currently catches ValueError and KeyError from client.shields.list() calls;
replace those with the llama-stack-client exceptions (catch APIConnectionError
for connection issues or APIError as the base class) so real API errors are not
silently suppressed, and keep the same fallback behavior (log a clear message
including the exception and set available_input_shields and
available_output_shields to empty lists). Also update
src/app/endpoints/streaming_query.py around lines 1016 to 1023 to wrap the
shields.list() call in a similar try/except that catches APIConnectionError or
APIError, logs the exception, and falls back to an empty shields list so the
code behaves consistently when the Safety API is unavailable.
|
/hold |
First task to address #784
Description
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Chores