Skip to content

Conversation

@ilopezluna
Copy link
Contributor

@ilopezluna ilopezluna commented Aug 1, 2025

What does this PR do?

Add support to Docker Model Runner. The implementation follows the same pattern as Ollama integration but uses OpenAI-compatible endpoints.

Related Issues

#472

Contributor License Agreement

I, @ilopezluna, confirm that I have read and agree to the Contributors License Agreement.

Checklists

  • Tests have been run locally and passed
  • [] New tests have been added to any work in /lib

Summary by CodeRabbit

  • New Features
    • Docker Model Runner added as a first-class local provider: connect flow, public connect API, model discovery, UI connect/status controls, icon, and optional custom URL.
  • Bug Fixes
    • Improved connection validation and error handling; updated API-key/disconnect behavior for the new provider.
  • Documentation
    • Provider listings and descriptions updated to include Docker Model Runner.
  • Tests
    • Extensive unit and integration tests covering connection, model parsing, availability, and config persistence.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 1, 2025

Walkthrough

Adds Docker Model Runner as a first-class local provider: backend connection/discovery, OpenAPI endpoint, adapter support, config and enum entries, frontend connect UI and types, expanded model list entries, and extensive unit tests for connection and discovery.

Changes

Cohort / File(s) Change Summary
Backend: Provider API & tests
app/desktop/studio_server/provider_api.py, app/desktop/studio_server/test_provider_api.py
Add connect_docker_model_runner, expose /api/provider/docker_model_runner/connect, integrate Docker Model Runner into available models, update API-key/disconnect handling, and add tests for URL validation, connection errors, persistence, and model mapping.
Docker Model Runner tools & tests
libs/core/kiln_ai/adapters/docker_model_runner_tools.py, libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py
New utilities to resolve base URL, probe /v1/models, parse supported/untested models, represent DockerModelRunnerConnection, and obtain connections via OpenAI client; comprehensive unit tests with mocks.
Adapters & registry / model list
libs/core/kiln_ai/adapters/adapter_registry.py, libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py, libs/core/kiln_ai/adapters/ml_model_list.py, libs/core/kiln_ai/adapters/test_adapter_registry.py
Register docker_model_runner in adapter selection (LiteLlmAdapter), provider-specific response_format/model_id handling, and add many docker_model_runner-backed model entries (including new Gemma variant); add adapter tests.
Provider tooling & config / enums
libs/core/kiln_ai/adapters/provider_tools.py, libs/core/kiln_ai/utils/config.py, libs/core/kiln_ai/datamodel/datamodel_enums.py
Add provider_enabled support for docker_model_runner, provider_name mapping, new config docker_model_runner_base_url, and enum member docker_model_runner.
Frontend: API schema, types, images, UI
app/web_ui/src/lib/api_schema.d.ts, app/web_ui/src/lib/types.ts, app/web_ui/src/lib/ui/provider_image.ts, app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
Add OpenAPI schema DockerModelRunnerConnection, extend ModelProviderName enum, export type alias, add provider image mapping, and implement connect UI/state, custom-URL dialog, and connect flow for Docker Model Runner.
Provider utilities tests
libs/core/kiln_ai/adapters/test_provider_tools.py
Add tests for provider_enabled behavior, connection error handling, and provider_name_from_id for docker_model_runner.
Misc & docs / generation imports
.cursor/rules/project.mdc, app/desktop/studio_server/test_generate_docs.py, libs/core/kiln_ai/adapters/test_ml_model_list.py
Typo fix in rules, update several test imports to new paths, and add test ensuring unique providers per model.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant Backend
    participant DMR as DockerModelRunner

    User->>Frontend: Open connect UI / choose Docker Model Runner
    Frontend->>Backend: GET /api/provider/docker_model_runner/connect?docker_model_runner_custom_url=...
    Backend->>DMR: GET /v1/models
    DMR-->>Backend: return models list or error
    Backend-->>Frontend: DockerModelRunnerConnection (supported/untested) or HTTP error
    Frontend-->>User: Display connection status and models
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

Suggested labels

codex

Suggested reviewers

  • scosman

Poem

In my burrow I tap a key and hum,
A Docker runner greets the sun.
Models line up, ready to run,
No keys, just kernels — oh what fun!
Hops and bytes, a rabbit's code-run. 🐰🚀

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (1)

111-125: Fix incorrect exception mocking in failure test

The test is mocking httpx.RequestError as a generic Exception, but the implementation specifically catches httpx.RequestError. This doesn't properly test the exception handling.

 @pytest.mark.asyncio
 async def test_docker_model_runner_online_failure():
     """Test that docker_model_runner_online returns False when service is unavailable."""
     with patch("kiln_ai.adapters.docker_model_runner_tools.httpx") as mock_httpx:
-        mock_httpx.get.side_effect = Exception("Connection error")
-        mock_httpx.RequestError = Exception
+        import httpx
+        mock_httpx.RequestError = httpx.RequestError
+        mock_httpx.get.side_effect = httpx.RequestError("Connection error")
 
         from kiln_ai.adapters.docker_model_runner_tools import (
             docker_model_runner_online,
         )
 
         result = await docker_model_runner_online()
 
         assert result is False
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f829026 and 78fcde4.

⛔ Files ignored due to path filters (1)
  • app/web_ui/static/images/docker.svg is excluded by !**/*.svg
📒 Files selected for processing (12)
  • app/desktop/studio_server/provider_api.py (7 hunks)
  • app/web_ui/src/lib/api_schema.d.ts (4 hunks)
  • app/web_ui/src/lib/ui/provider_image.ts (1 hunks)
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (5 hunks)
  • libs/core/kiln_ai/adapters/adapter_registry.py (1 hunks)
  • libs/core/kiln_ai/adapters/docker_model_runner_tools.py (1 hunks)
  • libs/core/kiln_ai/adapters/ml_model_list.py (1 hunks)
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (2 hunks)
  • libs/core/kiln_ai/adapters/provider_tools.py (3 hunks)
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (1 hunks)
  • libs/core/kiln_ai/datamodel/datamodel_enums.py (1 hunks)
  • libs/core/kiln_ai/utils/config.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
app/web_ui/src/**/*.{js,jsx,ts,tsx,svelte}

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

Our web UI component framework is DaisyUI v4 (based on Tailwind). Be sure to use v4 docs and features, not v5

Files:

  • app/web_ui/src/lib/ui/provider_image.ts
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
  • app/web_ui/src/lib/api_schema.d.ts
**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/*.py: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above

Files:

  • libs/core/kiln_ai/datamodel/datamodel_enums.py
  • libs/core/kiln_ai/utils/config.py
  • libs/core/kiln_ai/adapters/adapter_registry.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • app/desktop/studio_server/provider_api.py
  • libs/core/kiln_ai/adapters/docker_model_runner_tools.py
**/test_*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/test_*.py: Always use pytest for tests in Python code
Test brevity is important. Use approaches for re-use and brevity including using fixtures for repeated code, and using pytest parameterize for similar tests

Files:

  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py
🧠 Learnings (2)
📚 Learning: the `glm_z1_rumination_32b_0414` model was intentionally removed from the built_in_models list due t...
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:0-0
Timestamp: 2025-07-16T09:37:39.816Z
Learning: The `glm_z1_rumination_32b_0414` model was intentionally removed from the built_in_models list due to output formatting issues: output was duplicated in both `output` and `reasoning` fields, and contained random internal JSON in the output. This model should not be re-added without addressing these formatting problems.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: in the kiln web ui's project import functionality (edit_project.svelte), when the file selector api ...
Learnt from: scosman
PR: Kiln-AI/Kiln#464
File: app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_project/edit_project.svelte:116-138
Timestamp: 2025-07-30T02:50:29.383Z
Learning: In the Kiln web UI's project import functionality (edit_project.svelte), when the file selector API fails, the preferred approach is to use alert() as a brief notification and set select_file_unavailable = true to gracefully fall back to manual path entry, rather than using the form's error handling system. This maintains a smooth user experience with automatic fallback rather than showing an error state.

Applied to files:

  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
🧬 Code Graph Analysis (4)
libs/core/kiln_ai/adapters/ml_model_list.py (2)
libs/core/kiln_ai/datamodel/datamodel_enums.py (2)
  • ModelProviderName (80-100)
  • StructuredOutputMode (23-45)
app/web_ui/src/lib/types.ts (2)
  • ModelProviderName (41-41)
  • StructuredOutputMode (40-40)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (1)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • ModelProviderName (80-100)
libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (1)
libs/core/kiln_ai/adapters/docker_model_runner_tools.py (5)
  • DockerModelRunnerConnection (41-48)
  • docker_model_runner_base_url (11-22)
  • parse_docker_model_runner_models (52-88)
  • all_models (47-48)
  • docker_model_runner_online (25-38)
libs/core/kiln_ai/adapters/provider_tools.py (2)
libs/core/kiln_ai/adapters/docker_model_runner_tools.py (1)
  • get_docker_model_runner_connection (91-110)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • ModelProviderName (80-100)
🔇 Additional comments (36)
app/web_ui/src/lib/ui/provider_image.ts (1)

9-9: LGTM!

The addition of the docker_model_runner provider image mapping follows the established pattern and uses a consistent file path structure.

libs/core/kiln_ai/datamodel/datamodel_enums.py (1)

100-100: LGTM!

The enum addition follows the established pattern with the string value matching the enum name, maintaining consistency across the codebase.

libs/core/kiln_ai/utils/config.py (1)

57-60: LGTM!

The configuration property addition follows the established pattern for base URL configurations and uses appropriate environment variable naming conventions.

libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (2)

184-186: LGTM!

The Docker Model Runner integration correctly follows the established pattern used for Ollama, treating it as an OpenAI-compatible provider with JSON schema support. The implementation and comment are both accurate.


322-325: LGTM!

The implementation correctly treats Docker Model Runner as a custom provider (similar to Ollama) to enable direct control over API requests. This approach is appropriate for supporting advanced features like response_format=json_schema.

libs/core/kiln_ai/adapters/ml_model_list.py (1)

1228-1233: LGTM! Docker Model Runner provider properly configured.

The new provider entry follows established patterns and is well-configured:

  • Uses appropriate structured output mode (json_schema) for OpenAI-compatible endpoints
  • Correctly disables data generation for a 3B model
  • Model ID follows Docker naming conventions
  • Aligns with the PR objective to add Docker Model Runner support
libs/core/kiln_ai/adapters/adapter_registry.py (1)

109-126: LGTM! Docker Model Runner adapter follows established patterns.

The implementation correctly mirrors the Ollama pattern:

  • Uses LiteLlmAdapter with OpenAI-compatible configuration
  • Constructs base URL from config with sensible default fallback
  • Adds "/v1" suffix for OpenAI-compatible endpoint
  • Uses placeholder API key "DMR" to satisfy LiteLLM requirements (similar to Ollama's "NA")
  • Default URL points to appropriate llamacpp endpoint on distinct port (12434)

This integration properly extends the adapter registry to support the new Docker Model Runner provider.

libs/core/kiln_ai/adapters/provider_tools.py (3)

5-7: LGTM: Import statement follows established pattern

The import correctly brings in the Docker Model Runner connection function, following the same pattern as the Ollama integration above it.


43-50: LGTM: Provider enablement logic is consistent

The Docker Model Runner enablement check follows the exact same pattern as the Ollama provider, checking for connection availability and the presence of supported or untested models. The implementation is correct and consistent.


399-400: LGTM: Human-readable name mapping added correctly

The provider name mapping correctly returns "Docker Model Runner" for the docker_model_runner provider ID, maintaining consistency with other provider mappings in the match statement.

app/web_ui/src/lib/api_schema.d.ts (4)

423-439: LGTM! Endpoint definition follows established patterns.

The new Docker Model Runner connection endpoint follows the exact same structure as the existing Ollama provider endpoint, ensuring consistency in the API schema.


1541-1551: LGTM! Schema definition maintains consistency.

The DockerModelRunnerConnection schema correctly mirrors the structure of the existing OllamaConnection schema, ensuring consistent typing across different provider connection responses.


2180-2180: LGTM! Enum extension follows naming conventions.

The addition of "docker_model_runner" to the ModelProviderName enum follows the established snake_case naming convention and is consistent with the provider identifier used throughout the API.


3913-3943: LGTM! Operation definition maintains API consistency.

The operation definition correctly follows the established pattern from the Ollama provider, including the optional custom URL parameter and appropriate response types. The use of the new DockerModelRunnerConnection schema ensures type safety.

libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (6)

1-10: LGTM!

Import statements are appropriate for the test module.


12-18: LGTM!

Test correctly verifies the default base URL behavior.


20-28: LGTM!

Test correctly verifies custom base URL configuration.


60-70: LGTM!

Test correctly handles the case of no available models.


72-81: LGTM!

Test correctly handles invalid response format.


83-93: LGTM!

Test correctly verifies the all_models() method combines both model lists.

libs/core/kiln_ai/adapters/docker_model_runner_tools.py (5)

1-9: LGTM!

Import statements are appropriate for the module's functionality.


11-23: LGTM!

Function correctly handles base URL configuration with appropriate fallback.


41-49: LGTM!

Well-structured Pydantic model with appropriate fields and helper method.


51-89: LGTM!

Function correctly parses model responses and categorizes them appropriately.


113-118: LGTM!

Simple utility function correctly checks model availability.

app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (4)

61-66: LGTM!

Docker Model Runner provider definition is consistent with other providers.


225-230: LGTM!

Status initialization for Docker Model Runner is consistent with other providers.


373-375: LGTM!

Provider connection routing is properly integrated.


680-684: LGTM!

Docker Model Runner initialization on mount follows the same pattern as Ollama.

app/desktop/studio_server/provider_api.py (7)

11-14: LGTM!

Import statements for Docker Model Runner tools are appropriate.


84-146: LGTM!

The connect_docker_model_runner function is well-implemented with proper validation and error handling.


268-273: LGTM!

API endpoint follows the established pattern for provider connections.


241-245: LGTM!

Docker Model Runner integration into available models list is properly positioned.


384-385: LGTM!

Docker Model Runner correctly added to providers that don't support API key connections.


440-441: LGTM!

Docker Model Runner correctly added to providers that don't support API key disconnections.


1038-1058: LGTM!

The models_from_docker_model_runner_id function correctly maps Docker model IDs to Kiln models.

Comment on lines 472 to 541
const connect_docker_model_runner = async (user_initiated: boolean = true) => {
status.docker_model_runner.connected = false
status.docker_model_runner.connecting = user_initiated
let data: any | null = null
try {
const { data: req_data, error: req_error } = await client.GET(
"/api/provider/docker_model_runner/connect",
{
params: {
query: {
custom_docker_url: custom_docker_url || undefined,
},
},
},
)
if (req_error) {
throw req_error
}
data = req_data
} catch (e) {
if (
e &&
typeof e === "object" &&
"message" in e &&
typeof e.message === "string"
) {
status.docker_model_runner.error = e.message
} else {
status.docker_model_runner.error = "Failed to connect. Ensure Docker Model Runner is running."
}
status.docker_model_runner.connected = false
return
} finally {
status.docker_model_runner.connecting = false
}
if (
data.supported_models.length === 0 &&
(!data.untested_models || data.untested_models.length === 0)
) {
status.docker_model_runner.error =
"Docker Model Runner running, but no models available. Pull some models first: https://hub.docker.com/u/ai"
return
}
status.docker_model_runner.error = null
status.docker_model_runner.connected = true
const supported_models_str =
data.supported_models.length > 0
? "The following supported models are available: " +
data.supported_models.join(", ") +
". "
: "No supported models are loaded. "
const untested_models_str =
data.untested_models && data.untested_models.length > 0
? "The following untested models are loaded: " +
data.untested_models.join(", ") +
". "
: ""
const custom_url_str =
custom_docker_url && custom_docker_url !== "http://localhost:12434/engines/llama.cpp"
? "Custom Docker Model Runner URL: " + custom_docker_url
: ""
status.docker_model_runner.custom_description =
"Docker Model Runner connected. " +
supported_models_str +
untested_models_str +
custom_url_str
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix parameter name typo and use constant for default URL

  1. Parameter user_initiated has a typo when used as user_initated
  2. Default URL is hardcoded in the comparison
-const connect_docker_model_runner = async (user_initiated: boolean = true) => {
+const connect_docker_model_runner = async (user_initiated: boolean = true) => {
   status.docker_model_runner.connected = false
-  status.docker_model_runner.connecting = user_initiated
+  status.docker_model_runner.connecting = user_initiated

   // ... rest of the function ...

   const custom_url_str =
-    custom_docker_url && custom_docker_url !== "http://localhost:12434/engines/llama.cpp"
+    custom_docker_url && custom_docker_url !== DEFAULT_DOCKER_URL
       ? "Custom Docker Model Runner URL: " + custom_docker_url
       : ""

Add this constant at the top of the script section:

const DEFAULT_DOCKER_URL = "http://localhost:12434/engines/llama.cpp"
🤖 Prompt for AI Agents
In
app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
between lines 472 and 541, fix the typo in the parameter name by replacing all
instances of "user_initated" with the correct "user_initiated". Also, define a
constant DEFAULT_DOCKER_URL at the top of the script section with the value
"http://localhost:12434/engines/llama.cpp" and replace the hardcoded default URL
string in the comparison with this constant to improve maintainability.

@scosman
Copy link
Collaborator

scosman commented Aug 1, 2025

@ilopezluna very cool!

Our CR bot code rabbit found some small issues (typos, etc). Want to tackle those (or just mark as resolved if it's wrong about something) and let me know when it's ready?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
app/desktop/studio_server/provider_api.py (1)

133-137: Use specific exception handling instead of generic Exception.

This is the same issue identified in the past review. The function should catch specific exceptions rather than the generic Exception to avoid hiding unexpected errors.

-    except Exception as e:
+    except (openai.APIError, httpx.RequestError) as e:
         raise HTTPException(
             status_code=500,
             detail=f"Failed to connect to Docker Model Runner: {e}",
         )
🧹 Nitpick comments (1)
app/desktop/studio_server/provider_api.py (1)

100-102: Move imports to the top of the file.

The imports inside the function should be moved to the top of the file for better organization and consistency with Python conventions.

+from kiln_ai.adapters.docker_model_runner_tools import (
+    docker_model_runner_base_url,
+    parse_docker_model_runner_models,
+)

Then remove the local imports from within the function.

Also applies to: 116-118

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78fcde4 and fbe9ae2.

📒 Files selected for processing (5)
  • app/desktop/studio_server/provider_api.py (7 hunks)
  • app/web_ui/src/lib/types.ts (1 hunks)
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (7 hunks)
  • libs/core/kiln_ai/adapters/docker_model_runner_tools.py (1 hunks)
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • app/web_ui/src/lib/types.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py
  • libs/core/kiln_ai/adapters/docker_model_runner_tools.py
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/*.py: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above

Files:

  • app/desktop/studio_server/provider_api.py
🧬 Code Graph Analysis (1)
app/desktop/studio_server/provider_api.py (5)
libs/core/kiln_ai/adapters/docker_model_runner_tools.py (4)
  • DockerModelRunnerConnection (43-50)
  • get_docker_model_runner_connection (93-112)
  • docker_model_runner_base_url (11-22)
  • parse_docker_model_runner_models (54-90)
libs/core/kiln_ai/utils/config.py (3)
  • Config (28-252)
  • shared (149-152)
  • save_setting (237-238)
libs/core/kiln_ai/adapters/provider_tools.py (1)
  • provider_name_from_id (353-405)
app/web_ui/src/lib/stores.ts (1)
  • provider_name_from_id (316-324)
libs/core/kiln_ai/adapters/ml_model_list.py (2)
  • KilnModel (207-222)
  • KilnModelProvider (161-204)
🔇 Additional comments (6)
app/desktop/studio_server/provider_api.py (6)

6-6: LGTM - Imports are properly used.

The new imports are appropriately added to support Docker Model Runner functionality and are actually used in the code.

Also applies to: 12-15


269-273: LGTM - API endpoint follows established patterns.

The new endpoint is consistent with the existing Ollama connection endpoint and properly structured.


242-245: LGTM - Integration follows established patterns.

The Docker Model Runner integration in the available models list is consistent with the Ollama pattern and includes proper positioning logic.


385-385: LGTM - Correctly excludes Docker Model Runner from API key operations.

Docker Model Runner, like Ollama, doesn't require API keys, so correctly excluding it from these operations follows the established pattern.

Also applies to: 441-441


979-1037: LGTM - Function follows established patterns with proper exception handling.

This function correctly mirrors the Ollama pattern and properly handles specific exceptions as requested in previous feedback. The implementation is consistent and well-structured.


1039-1059: LGTM - Helper function follows established patterns.

The function correctly mirrors the models_from_ollama_tag pattern and provides consistent model mapping functionality for Docker Model Runner.

@ilopezluna
Copy link
Contributor Author

@ilopezluna very cool!

Our CR bot code rabbit found some small issues (typos, etc). Want to tackle those (or just mark as resolved if it's wrong about something) and let me know when it's ready?

Thank you @scosman! CodeRabbit looks really cool, I hadn’t heard of it before 👏
I’ve just pushed the changes it suggested. I don’t have much experience with Python, so I’m trusting its judgment more than mine.

Copy link
Collaborator

@scosman scosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Easy to use like Ollama. Nice work!

Code generally looks great and matches the project style. I left a few comments while looking around, nothing major. I can take a deeper look next week.

Random feedback as new DMR user

Great UX. Opened desktop, clicked run, worked.

One thing I would change: the 'latest' tag has been nothing but headaches for me with Ollama. Usability wise, it's a bit hard to figure out which model size it is. Over time they can release new models in same family, and then 'latest' is either not actually latest, or a different size than it used to be. When we store that a response was generated by a specific model 'latest' can burn us; we want the model name to be explicit and permanent. I'd suggest keeping model tags tied to the size.

),
KilnModelProvider(
name=ModelProviderName.docker_model_runner,
model_id="ai/llama3.2",
Copy link
Collaborator

@scosman scosman Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set the size explicitly, so if the default ever changes, our data is still correct. llama3.2:3B-Q4_K_M?

We should also add the rest of the models. I know the list is bit and ugly... but it's actually really good for usability. Details: https://getkiln.ai/blog/i_wrote_2000_llm_test_cases_so_you_dont_have_to

Adding the models docker supports in there will do a few nice things:

  • Nice names in the UI
  • Add them to the test suite
  • Once tested that they will appear higher in the dropdown as we'll trust them to perform certain tasks better (JSON mode, complex requests like synthetic data, etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have update the model references to include a more explicit tag <weight>-<quantization>
I considered adding all supported models as well, but I thought it might be better to include them in a follow-up PR that focuses solely on listing the supported models.

That said, if you’d prefer to include them now, I’m happy to do so.

@scosman
Copy link
Collaborator

scosman commented Aug 2, 2025

oh small nit: Finetune badge doesn't need a star. Stars are to differentiate "suggested" but we don't have that concept for fine tunes

@ilopezluna
Copy link
Contributor Author

One thing I would change: the 'latest' tag has been nothing but headaches for me with Ollama. Usability wise, it's a bit hard to figure out which model size it is. Over time they can release new models in same family, and then 'latest' is either not actually latest, or a different size than it used to be. When we store that a response was generated by a specific model 'latest' can burn us; we want the model name to be explicit and permanent. I'd suggest keeping model tags tied to the size.

Yeah, I remember we had this same discussion in the DMR team :)
We were very close to not including the latest tag, precisely for the reasons you mentioned.
We ended up adding it because we thought it might help new users. At the time, we were focused on driving adoption, so we decided to include it and updated the model overview on Hub to clearly indicate which version latest points to: https://hub.docker.com/r/ai/llama3.2
However I still not convinced of using this tag, because as you mention, it creates confusion.

@ilopezluna
Copy link
Contributor Author

oh small nit: Finetune badge doesn't need a star. Stars are to differentiate "suggested" but we don't have that concept for fine tunes

Sorry @scosman I didn't quite get that. Could you please clarify what you mean?

@scosman
Copy link
Collaborator

scosman commented Aug 4, 2025

Sorry @scosman I didn't quite get that. Could you please clarify what you mean?

That was a comment on another PR. No idea how I managed to drop it here. Please disregard!

Copy link
Collaborator

@scosman scosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small changes/cleanup, but nothing major.

Great work navigating the codebase. Providers touch a lot of areas and were some of the earliest code!

model_id="ai/llama3.2:3B-Q4_K_M",
structured_output_mode=StructuredOutputMode.json_schema,
supports_data_gen=False,
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add the others in this PR. The usability of DMR goes way up with them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 6fc2711

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I've also added the models with the latest and default tags. Most DMR users will use those, although, as we discussed, it can still be confusing.
Because these tags point to the same model I was wondering that would be nice to have a way to allow multiple ids or aliases per models.
Anyway, let me know if you prefer to remove these model references from this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so we can only have 1 entry per provider in there. I'll add a test to check that. If we add multiple "Llama 8B on DMR" becomes ambiguous which isn't good for reproducibility.

Let's just use the Q4_K_M versions (or similar)? The user can still use the others; they will automatically appear in the dropdown, but with their own more specific IDs.

@leonardmq
Copy link
Collaborator

Nice! I see DMR also has some (or one?) embedding model that we might be able to add support for our upcoming RAG tooling after this is merged.

This is the first external PR since we added a test coverage CI, I tried to run it (results are here), but looks like it is not posting the nicely formatted report back as a comment into the PR if it is from a fork 😄

Replace Any parameter with List[openai.types.Model] and eliminate brittle dict parsing logic
# Conflicts:
#	app/web_ui/src/lib/api_schema.d.ts
#	libs/core/kiln_ai/adapters/provider_tools.py
#	libs/core/kiln_ai/datamodel/datamodel_enums.py
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
app/desktop/studio_server/provider_api.py (2)

1040-1098: Populate model flags from provider metadata and reflect DMR’s JSON-schema support.

  • Use provider’s uncensored and suggested_for_uncensored_data_gen instead of hardcoding.
  • Set supports_structured_output=True for Docker MR (DMR supports json_schema-constrained decoding natively). Also set it true for untested models to avoid misleading UI warnings.
-                            supports_structured_output=docker_provider.supports_structured_output,
+                            supports_structured_output=True,
@@
-                            uncensored=False,
-                            suggested_for_uncensored_data_gen=False,
+                            uncensored=docker_provider.uncensored,
+                            suggested_for_uncensored_data_gen=docker_provider.suggested_for_uncensored_data_gen,
@@
-                    supports_structured_output=False,
+                    supports_structured_output=True,

To double-check that the provider metadata includes uncensored and suggested_for_uncensored_data_gen for Docker MR, run:

#!/bin/bash
# Inspect provider definitions for docker_model_runner
fd -t f ml_model_list.py | xargs rg -n "docker_model_runner" -A 8 -B 8

221-225: Fix insertion index to keep ordering stable (Ollama first, then Docker if both present).

Using insert(-1, ...) is surprising; it inserts before the last item and may shuffle relative position against key-based providers. Deterministically place Docker right after Ollama, or at the start if Ollama absent.

-        if docker_models:
-            models.insert(-1 if ollama_models else 0, docker_models)
+        if docker_models:
+            insert_idx = 1 if ollama_models else 0
+            models.insert(insert_idx, docker_models)
🧹 Nitpick comments (1)
app/desktop/studio_server/provider_api.py (1)

248-254: Minor: simplify by removing the temporary variable.

Keeps parity with the Ollama connect endpoint and reduces noise.

-    async def connect_docker_model_runner_api(
-        docker_model_runner_custom_url: str | None = None,
-    ) -> DockerModelRunnerConnection:
-        chosen_url = docker_model_runner_custom_url
-        return await connect_docker_model_runner(chosen_url)
+    async def connect_docker_model_runner_api(
+        docker_model_runner_custom_url: str | None = None,
+    ) -> DockerModelRunnerConnection:
+        return await connect_docker_model_runner(docker_model_runner_custom_url)

If you apply the earlier param-rename, calling with a keyword is also fine.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between baccb52 and 5b690ad.

📒 Files selected for processing (13)
  • .cursor/rules/project.mdc (1 hunks)
  • app/desktop/studio_server/provider_api.py (7 hunks)
  • app/web_ui/src/lib/api_schema.d.ts (4 hunks)
  • app/web_ui/src/lib/ui/provider_image.ts (1 hunks)
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (10 hunks)
  • libs/core/kiln_ai/adapters/adapter_registry.py (1 hunks)
  • libs/core/kiln_ai/adapters/docker_model_runner_tools.py (1 hunks)
  • libs/core/kiln_ai/adapters/ml_model_list.py (1 hunks)
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (2 hunks)
  • libs/core/kiln_ai/adapters/provider_tools.py (3 hunks)
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (1 hunks)
  • libs/core/kiln_ai/datamodel/datamodel_enums.py (1 hunks)
  • libs/core/kiln_ai/utils/config.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • .cursor/rules/project.mdc
  • app/web_ui/src/lib/ui/provider_image.ts
  • libs/core/kiln_ai/adapters/provider_tools.py
🚧 Files skipped from review as they are similar to previous changes (9)
  • libs/core/kiln_ai/datamodel/datamodel_enums.py
  • libs/core/kiln_ai/utils/config.py
  • libs/core/kiln_ai/adapters/adapter_registry.py
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py
  • libs/core/kiln_ai/adapters/docker_model_runner_tools.py
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/*.py: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above

Files:

  • app/desktop/studio_server/provider_api.py
🧠 Learnings (1)
📚 Learning: 2025-08-08T15:50:45.306Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.306Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • app/desktop/studio_server/provider_api.py
🔇 Additional comments (4)
app/desktop/studio_server/provider_api.py (4)

6-6: Imports look good and are placed at top.

httpx is used for specific exception handling. Docker Model Runner tools are properly imported.

Also applies to: 12-15


103-107: Good, explicit connection error message.

Clear guidance about enabling TCP and checking docs is user-friendly right where it matters.


369-370: Correctly mark Docker Model Runner as not supporting API keys.

Consistent with provider capabilities; prevents unsupported connect/disconnect attempts.

Also applies to: 429-430


1100-1120: Confirm whether DMR model IDs can have aliases (e.g., “:latest”) and handle them if so.

Ollama has alias/tag handling; Docker MR may not, but if aliases exist, map them similarly to avoid missing known models.

If aliases exist, we can extend this function to match alias sets. Would you like a patch once confirmed?

…o expected client errors.

This prevents masking unexpected bugs and aligns with earlier feedback on specific exception handling.
- When only Ollama is available → Ollama appears first
- When both are available → Ollama appears first, Docker second
@ilopezluna
Copy link
Contributor Author

Nice! I see DMR also has some (or one?) embedding model that we might be able to add support for our upcoming RAG tooling after this is merged.

We recently added: https://hub.docker.com/r/ai/granite-embedding-multilingual and https://hub.docker.com/r/ai/nomic-embed-text-v1.5 as well 👍

This is the first external PR since we added a test coverage CI, I tried to run it (results are here), but looks like it is not posting the nicely formatted report back as a comment into the PR if it is from a fork 😄

Thanks for pointing it out @leonardmq ! I'm not familiar with the coverage conventions of the project, could you advise me on whether I should add more tests?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (1)

576-580: Hardcoded default URL comparison.

The comparison uses a hardcoded URL string instead of a constant, which reduces maintainability.

Consider defining a constant at the top of the script section for better maintainability:

+const DEFAULT_DOCKER_URL = "http://localhost:12434/engines/llama.cpp"
+
 const connect_docker_model_runner = async (
   user_initiated: boolean = true,
 ) => {
   // ... rest of function ...
   
   const custom_url_str =
-    docker_model_runner_custom_url &&
-    docker_model_runner_custom_url !==
-      "http://localhost:12434/engines/llama.cpp"
+    docker_model_runner_custom_url && docker_model_runner_custom_url !== DEFAULT_DOCKER_URL
       ? "Custom Docker Model Runner URL: " + docker_model_runner_custom_url
       : ""
🧹 Nitpick comments (2)
app/desktop/studio_server/provider_api.py (1)

219-223: Docker Model Runner models are inserted at index 0.

While the implementation is correct, inserting Docker Model Runner models at index 0 means they will appear before Ollama models in the UI, which may not be the intended ordering given that Ollama was traditionally the primary local provider.

Consider whether Docker Model Runner should be ordered after Ollama to maintain consistency with existing user expectations, or if this new ordering is intentional.

app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (1)

1082-1082: Inconsistent placeholder URL.

The placeholder shows http://localhost:12434/engines but the description mentions http://localhost:12434/engines/llama.cpp and the hardcoded comparison in the code uses the latter.

Consider making the placeholder consistent with the actual default URL:

-      placeholder="http://localhost:12434/engines"
+      placeholder="http://localhost:12434/engines/llama.cpp"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b690ad and b856920.

📒 Files selected for processing (2)
  • app/desktop/studio_server/provider_api.py (7 hunks)
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (10 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/*.py: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above

Files:

  • app/desktop/studio_server/provider_api.py
app/web_ui/src/**/*.{js,jsx,ts,tsx,svelte}

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

Our web UI component framework is DaisyUI v4 (based on Tailwind). Be sure to use v4 docs and features, not v5

Files:

  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
🧠 Learnings (1)
📚 Learning: 2025-08-08T15:50:45.306Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.306Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • app/desktop/studio_server/provider_api.py
🧬 Code Graph Analysis (1)
app/desktop/studio_server/provider_api.py (5)
libs/core/kiln_ai/adapters/docker_model_runner_tools.py (3)
  • DockerModelRunnerConnection (43-50)
  • get_docker_model_runner_connection (90-112)
  • docker_model_runner_base_url (11-22)
libs/core/kiln_ai/utils/config.py (2)
  • Config (28-262)
  • shared (159-162)
libs/core/kiln_ai/datamodel/datamodel_enums.py (2)
  • ModelProviderName (80-102)
  • StructuredOutputMode (23-45)
libs/core/kiln_ai/adapters/provider_tools.py (1)
  • provider_name_from_id (342-398)
libs/core/kiln_ai/adapters/ml_model_list.py (2)
  • KilnModel (244-259)
  • KilnModelProvider (189-241)
🔇 Additional comments (19)
app/desktop/studio_server/provider_api.py (8)

6-6: LGTM!

The httpx import is correctly placed at the top level and is needed for the Docker Model Runner exception handling.


12-15: LGTM!

The imports from Docker Model Runner tools are appropriately organized and follow the project's import structure.


85-128: LGTM!

The connect_docker_model_runner function is well-implemented with proper error handling. The URL validation, exception handling with pass-through for HTTPExceptions, and config saving logic all follow established patterns from the Ollama implementation.


251-256: LGTM!

The API endpoint follows the established pattern and correctly delegates to the connection function with the custom URL parameter.


372-372: LGTM!

Correctly marks Docker Model Runner as unsupported for API key connections, consistent with other local providers like Ollama.


432-432: LGTM!

Correctly marks Docker Model Runner as unsupported for API key disconnections, consistent with the connect_api_key implementation.


1043-1101: LGTM!

The available_docker_model_runner_models function follows the same pattern as the Ollama equivalent. The error handling correctly catches the specific exceptions mentioned in past reviews and the model detail mapping looks appropriate.


1103-1123: LGTM!

The models_from_docker_model_runner_id function follows the exact same pattern as models_from_ollama_tag and correctly maps Docker Model Runner model IDs to KilnModel/KilnModelProvider pairs.

app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (11)

4-7: LGTM!

The import correctly includes the DockerModelRunnerConnection type needed for the new functionality.


64-69: LGTM!

The Docker Model Runner provider entry is well-structured and follows the same pattern as other local providers like Ollama.


253-258: LGTM!

The Docker Model Runner status object follows the same structure as other providers.


413-415: LGTM!

The provider connection logic correctly calls the Docker Model Runner connect function.


427-429: LGTM!

The parameter name change from the typo to user_initiated is correct and maintains consistency.


512-586: LGTM!

The connect_docker_model_runner function is well-implemented and follows the same pattern as connect_ollama. The error handling, status management, and descriptive message construction are all appropriate.


686-688: LGTM!

The Docker Model Runner base URL is correctly loaded from the settings during provider initialization.


735-739: LGTM!

The automatic connection check on mount follows the same pattern as Ollama, appropriately clearing errors for non-user-initiated connections.


747-750: LGTM!

The dialog function follows the same pattern as the Ollama equivalent.


961-968: LGTM!

The error state UI correctly shows the custom URL button when Docker Model Runner has connection issues, following the same pattern as Ollama.


1062-1104: LGTM!

The Docker Model Runner dialog is well-implemented and follows the same structure and styling as the Ollama dialog. The placeholder and description text are appropriate for Docker Model Runner usage.

@leonardmq
Copy link
Collaborator

Thanks for pointing it out @leonardmq ! I'm not familiar with the coverage conventions of the project, could you advise me on whether I should add more tests?

@ilopezluna - we do not have a hard number for coverage requirements, current overall backend coverage is at 92%. We try to go for 100% coverage, but some modules are harder to test.

Latest diff coverage: https://github.com/Kiln-AI/Kiln/actions/runs/16907251293/artifacts/3746939147

My advice would be to look at the files with low coverage in this diff coverage and see if the untested lines are worth testing, and see if we already have similar tests for ollama or some other provider. For example, for connect_docker_model_runner (in provider_api.py), we might have some ollama tests that might be copiable (in test_provider_api.py). Disclaimer that I am not too familiar with our Ollama code, so am not sure.

Precision: this is only about the backend; we have few tests on the frontend at the moment so no need to worry about that part

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
app/desktop/studio_server/provider_api.py (1)

1081-1081: Consider conservative default for untested model structured output.

The code sets supports_structured_output=True for untested Docker Model Runner models. While Docker Model Runner does support JSON schema, being more conservative with supports_structured_output=False for truly unknown models might be safer, following the pattern used for Ollama untested models.

From the past review comments, this was flagged as potentially too optimistic.

Consider changing to:

-                    supports_structured_output=True,
+                    supports_structured_output=False,
🧹 Nitpick comments (1)
libs/core/kiln_ai/adapters/ml_model_list.py (1)

1198-1217: Consider consolidating multiple model_id variants.

Multiple entries for the same base model with different tags (e.g., ai/llama3.1, ai/llama3.1:latest, ai/llama3.1:8B-Q4_K_M, ai/llama3.1:8B-F16) create UI clutter. Consider if these represent genuinely different models or if they could be consolidated with a canonical model_id.

From the past review comments, it seems there was discussion about using explicit tags like "3B-Q4_K_M" instead of "latest" for clarity.

Also applies to: 1556-1575, 2757-2791

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b856920 and 40a8d67.

📒 Files selected for processing (2)
  • app/desktop/studio_server/provider_api.py (7 hunks)
  • libs/core/kiln_ai/adapters/ml_model_list.py (19 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/*.py: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above

Files:

  • app/desktop/studio_server/provider_api.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
🧠 Learnings (5)
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • app/desktop/studio_server/provider_api.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:19:20.074Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1736-1741
Timestamp: 2025-08-08T16:19:20.074Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwq_32b with ModelProviderName.siliconflow_cn (model_id "Qwen/QwQ-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via tests/responses that SiliconFlow returns outputs that parse correctly without an R1 parser. Parser usage remains provider-specific and should only be added when validated.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:15:20.796Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:2434-2439
Timestamp: 2025-08-08T16:15:20.796Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwen_3_8b_no_thinking with ModelProviderName.siliconflow_cn (model_id "Qwen/Qwen3-8B"), the qwen3_style_no_think formatter is not required: SiliconFlow’s non‑thinking Qwen3‑8B returns clean output without <answer> wrappers. Formatter/parser usage is provider-specific and should be added only when verified by tests/responses.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:14:54.346Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
🧬 Code Graph Analysis (2)
app/desktop/studio_server/provider_api.py (5)
libs/core/kiln_ai/adapters/docker_model_runner_tools.py (3)
  • DockerModelRunnerConnection (43-50)
  • get_docker_model_runner_connection (90-112)
  • docker_model_runner_base_url (11-22)
libs/core/kiln_ai/utils/config.py (3)
  • Config (28-262)
  • shared (159-162)
  • save_setting (247-248)
libs/core/kiln_ai/datamodel/datamodel_enums.py (2)
  • ModelProviderName (80-102)
  • StructuredOutputMode (23-45)
libs/core/kiln_ai/adapters/provider_tools.py (1)
  • provider_name_from_id (342-398)
libs/core/kiln_ai/adapters/ml_model_list.py (2)
  • KilnModel (245-260)
  • KilnModelProvider (190-242)
libs/core/kiln_ai/adapters/ml_model_list.py (3)
libs/core/kiln_ai/datamodel/datamodel_enums.py (2)
  • ModelProviderName (80-102)
  • StructuredOutputMode (23-45)
app/web_ui/src/lib/types.ts (2)
  • ModelProviderName (43-43)
  • StructuredOutputMode (42-42)
libs/core/kiln_ai/adapters/parsers/test_r1_parser.py (1)
  • parser (8-9)
🔇 Additional comments (11)
libs/core/kiln_ai/adapters/ml_model_list.py (4)

96-96: LGTM: New model name follows naming conventions.

The addition of gemma_3_0p27b follows the established pattern for model naming with the 270M size encoded as "0p27b".


1725-1738: LGTM: Gemma 3 270M model correctly configured.

The new Gemma 3 270M model is appropriately configured with:

  • Disabled structured output and data generation (appropriate for a 270M parameter model)
  • Docker Model Runner provider only
  • Correct model_id format ai/gemma3:270M-F16

1334-1353: Ensure parser configuration consistency across Docker Model Runner providers.

The Docker Model Runner provider entries show extensive model coverage which is excellent. However, I notice some models (like QwQ, DeepSeek R1 variants) have reasoning capabilities and parsers configured correctly, while others don't. Please verify that all models requiring R1 thinking parsers have them properly configured.

According to the retrieved learnings, parser usage should be provider-specific and validated through tests. The configurations appear consistent with existing patterns for reasoning models.

Also applies to: 1436-1459, 1556-1575, 1618-1642, 1751-1756, 1775-1794, 1875-1898, 1961-1995, 2013-2032, 2289-2304, 2377-2426, 2618-2624, 2757-2791, 2856-2862, 2908-2919


1198-1217: Verify Docker Model Runner provider model configurations

The ripgrep command you ran produced no matches, so I can’t confirm whether these entries are distinct model images/tags. Please verify.

  • File: libs/core/kiln_ai/adapters/ml_model_list.py — lines 1198–1217
            KilnModelProvider(
                name=ModelProviderName.docker_model_runner,
                structured_output_mode=StructuredOutputMode.json_schema,
                model_id="ai/llama3.1",
            ),
            KilnModelProvider(
                name=ModelProviderName.docker_model_runner,
                structured_output_mode=StructuredOutputMode.json_schema,
                model_id="ai/llama3.1:latest",
            ),
            KilnModelProvider(
                name=ModelProviderName.docker_model_runner,
                structured_output_mode=StructuredOutputMode.json_schema,
                model_id="ai/llama3.1:8B-Q4_K_M",
            ),
            KilnModelProvider(
                name=ModelProviderName.docker_model_runner,
                structured_output_mode=StructuredOutputMode.json_schema,
                model_id="ai/llama3.1:8B-F16",
            ),

Requested verification steps:

  • Check the Docker/image registry or provider docs to confirm whether these tags point to distinct images.
  • If they are duplicates, collapse to a single canonical tag or add an inline comment explaining why multiple tags are listed.
  • If they are distinct, add brief comments or more descriptive identifiers to avoid confusion.
app/desktop/studio_server/provider_api.py (7)

12-15: LGTM: Clean imports for Docker Model Runner integration.

The imports are properly organized and include the necessary types and functions for Docker Model Runner support.


85-129: LGTM: Well-structured connection function with proper error handling.

The connect_docker_model_runner function follows the established pattern from the Ollama integration with:

  • Proper URL validation
  • Specific exception handling (addresses previous review feedback)
  • HTTPException passthrough to preserve status codes
  • Configuration persistence for custom URLs

The implementation addresses previous feedback about exception handling and parameter naming.


251-256: LGTM: Clean API endpoint implementation.

The Docker Model Runner connection endpoint follows the same pattern as the Ollama endpoint with appropriate parameter passing.


219-222: Verify provider ordering logic.

The Docker Model Runner models are inserted at position 0, which means they'll appear first in the UI. However, the comment and subsequent Ollama insertion at position 0 suggests Ollama should maintain the top spot when both are present. The current logic would place Docker Model Runner first, then Ollama.

Should the insertion logic be:

models.insert(1, docker_models)  # Insert at position 1 if Ollama will be at 0

Or is Docker Model Runner intended to have priority over Ollama?


372-372: LGTM: Proper exclusion from API key workflows.

Docker Model Runner is correctly excluded from both connect and disconnect API key workflows since it doesn't require API keys, following the same pattern as Ollama.

Also applies to: 432-432


1043-1101: LGTM: Well-structured model discovery implementation.

The available_docker_model_runner_models function follows the established pattern from Ollama with:

  • Proper error handling with specific exceptions (addresses previous feedback)
  • Correct mapping from Docker model IDs to Kiln models
  • Appropriate handling of both supported and untested models

1103-1122: LGTM: Clean model matching implementation.

The models_from_docker_model_runner_id function properly matches Docker model IDs to Kiln models and providers, following the same pattern as the Ollama equivalent.

@ilopezluna ilopezluna requested a review from scosman August 15, 2025 13:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (9)
libs/core/kiln_ai/adapters/test_provider_tools.py (1)

937-989: Good coverage for DMR provider_enabled; add an “untested-only” case

You cover success, empty models, and exception paths. Since provider_enabled returns True when untested_models > 0 (even if supported_models == 0), add a test asserting that behavior.

Apply this diff to add the missing case:

@@
 @pytest.mark.asyncio
 async def test_provider_enabled_docker_model_runner_connection_error():
@@
     result = await provider_enabled(ModelProviderName.docker_model_runner)
     assert result is False
 
 
+@pytest.mark.asyncio
+async def test_provider_enabled_docker_model_runner_untested_only():
+    """Test provider_enabled for Docker Model Runner when only untested models are present"""
+    with patch(
+        "kiln_ai.adapters.provider_tools.get_docker_model_runner_connection",
+        new_callable=AsyncMock,
+    ) as mock_get_docker:
+        mock_get_docker.return_value = DockerModelRunnerConnection(
+            message="Connected with untested",
+            supported_models=[],
+            untested_models=["some-new-model"],
+        )
+
+        result = await provider_enabled(ModelProviderName.docker_model_runner)
+        assert result is True
+
+
 def test_provider_name_from_id_docker_model_runner():
libs/core/kiln_ai/adapters/test_adapter_registry.py (2)

11-11: Prefer consistent enum usage for structured_output_mode across tests

You import StructuredOutputMode and use it in the new DMR tests. Consider switching the earlier tests that pass "json_schema" as a string to the enum for consistency and static checking.


269-341: Strong adapter tests for DMR; consider a trailing-slash case and reduce duplication

  • You validated default, custom, and None base URL behavior and the “/v1” suffix. Consider also testing when docker_model_runner_base_url ends with a trailing slash to ensure no double slashes in constructed base_url.
  • The three DMR tests share a lot of assertions; you could parametrize them to reduce duplication.

Add a trailing-slash test (optional):

@@
 def test_docker_model_runner_adapter_creation_with_none_url(mock_config, basic_task):
@@
     assert adapter.config.default_headers is None
+
+
+def test_docker_model_runner_adapter_creation_with_trailing_slash(mock_config, basic_task):
+    """Test DMR adapter creation when config URL ends with a trailing slash (no double slashes in /v1)."""
+    mock_config.shared.return_value.docker_model_runner_base_url = (
+        "http://custom:8080/engines/llama.cpp/"
+    )
+
+    adapter = adapter_for_task(
+        kiln_task=basic_task,
+        run_config_properties=RunConfigProperties(
+            model_name="llama_3_2_3b",
+            model_provider_name=ModelProviderName.docker_model_runner,
+            prompt_id="simple_prompt_builder",
+            structured_output_mode=StructuredOutputMode.json_schema,
+        ),
+    )
+
+    assert isinstance(adapter, LiteLlmAdapter)
+    assert adapter.config.base_url == "http://custom:8080/engines/llama.cpp/v1"

If this test fails, we should normalize the base URL in adapter_registry by trimming a trailing slash before appending “/v1”.

libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (3)

34-65: Model parsing test looks good; using the enum avoids string drift

You correctly patched built_in_models and used ModelProviderName.docker_model_runner. Consider asserting that other known IDs land in untested_models to fully validate the split, but not required.


221-275: Minor: error-path coverage overlaps; consider adding openai.APIError case

You already cover APIConnectionError and RequestError. Adding an openai.APIError case would exercise the remaining except branch.

Example (optional):

@@
 async def test_get_docker_model_runner_connection_http_error():
@@
     assert result is None
+
+
+@pytest.mark.asyncio
+async def test_get_docker_model_runner_connection_openai_api_error():
+    """Test get_docker_model_runner_connection with OpenAI API error."""
+    from kiln_ai.adapters.docker_model_runner_tools import get_docker_model_runner_connection
+    with patch("kiln_ai.adapters.docker_model_runner_tools.openai.OpenAI") as mock_openai:
+        mock_client = Mock()
+        mock_client.models.list.side_effect = openai.APIError(message="boom", request=Mock(), response=Mock())
+        mock_openai.return_value = mock_client
+        result = await get_docker_model_runner_connection()
+        assert result is None

277-310: Two installation tests could be parametrized to reduce duplication

Not required, but consider parametrize over presence/absence to keep the file concise.

app/desktop/studio_server/test_provider_api.py (3)

5-5: Remove redundant import of httpx

Line 392 imports httpx again when it's already imported on line 5. Remove the duplicate import.

@@ -389,8 +389,6 @@
 @pytest.mark.asyncio
 async def test_connect_docker_model_runner_http_error():
     """Test connect_docker_model_runner with HTTP errors"""
-    import httpx
-
     with patch(
         "app.desktop.studio_server.provider_api.get_docker_model_runner_connection"
     ) as mock_get_connection:

Also applies to: 7-7


583-583: Consider using a constant for expected model count

The assertion assert len(result.models) == 3 on line 583 depends on the specific test setup (2 supported + 1 untested). Consider using a more explicit assertion or comment to clarify the expected count.

-        assert len(result.models) == 3  # 2 supported + 1 untested
+        expected_model_count = len(mock_connection.supported_models) + len(mock_connection.untested_models)
+        assert len(result.models) == expected_model_count  # 2 supported + 1 untested

Also applies to: 610-610


592-593: Consider asserting on untested_model field explicitly

For completeness, consider asserting on the untested_model field for supported models to ensure it's explicitly False (not just falsy).

-        assert result.models[0].untested_model is False
+        assert result.models[0].untested_model is False  # Supported models should not be marked as untested
 
         # Check second supported model
         assert result.models[1].id == "supported2"
         assert result.models[1].name == "Supported 2"
         assert result.models[1].supports_structured_output is False
         assert (
             result.models[1].structured_output_mode == StructuredOutputMode.json_schema
         )
-        assert result.models[1].untested_model is False
+        assert result.models[1].untested_model is False  # Supported models should not be marked as untested

Also applies to: 600-601

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 40a8d67 and 5769339.

📒 Files selected for processing (4)
  • app/desktop/studio_server/test_provider_api.py (4 hunks)
  • libs/core/kiln_ai/adapters/test_adapter_registry.py (3 hunks)
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (1 hunks)
  • libs/core/kiln_ai/adapters/test_provider_tools.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/*.py: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above

Files:

  • libs/core/kiln_ai/adapters/test_adapter_registry.py
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py
  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • app/desktop/studio_server/test_provider_api.py
**/test_*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/test_*.py: Always use pytest for tests in Python code
Test brevity is important. Use approaches for re-use and brevity including using fixtures for repeated code, and using pytest parameterize for similar tests

Files:

  • libs/core/kiln_ai/adapters/test_adapter_registry.py
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py
  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • app/desktop/studio_server/test_provider_api.py
🧠 Learnings (1)
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • app/desktop/studio_server/test_provider_api.py
🧬 Code Graph Analysis (4)
libs/core/kiln_ai/adapters/test_adapter_registry.py (5)
libs/core/kiln_ai/datamodel/datamodel_enums.py (2)
  • StructuredOutputMode (23-45)
  • ModelProviderName (80-102)
libs/core/kiln_ai/adapters/docker_model_runner_tools.py (1)
  • docker_model_runner_base_url (11-22)
libs/core/kiln_ai/adapters/test_provider_tools.py (1)
  • mock_config (42-44)
libs/core/kiln_ai/datamodel/task.py (1)
  • RunConfigProperties (48-82)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (1)
  • LiteLlmAdapter (27-441)
libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (1)
libs/core/kiln_ai/adapters/docker_model_runner_tools.py (7)
  • DockerModelRunnerConnection (43-50)
  • docker_model_runner_base_url (11-22)
  • parse_docker_model_runner_models (54-87)
  • all_models (49-50)
  • docker_model_runner_online (25-40)
  • get_docker_model_runner_connection (90-112)
  • docker_model_runner_model_installed (115-119)
libs/core/kiln_ai/adapters/test_provider_tools.py (2)
libs/core/kiln_ai/adapters/provider_tools.py (2)
  • provider_enabled (28-53)
  • provider_name_from_id (342-398)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • ModelProviderName (80-102)
app/desktop/studio_server/test_provider_api.py (5)
libs/core/kiln_ai/datamodel/datamodel_enums.py (2)
  • StructuredOutputMode (23-45)
  • ModelProviderName (80-102)
app/desktop/studio_server/provider_api.py (3)
  • connect_docker_model_runner (85-128)
  • available_docker_model_runner_models (1043-1100)
  • models_from_docker_model_runner_id (1103-1122)
libs/core/kiln_ai/adapters/docker_model_runner_tools.py (1)
  • docker_model_runner_base_url (11-22)
libs/core/kiln_ai/utils/config.py (1)
  • save_setting (247-248)
libs/core/kiln_ai/adapters/ml_model_list.py (2)
  • KilnModel (245-260)
  • KilnModelProvider (190-242)
🔇 Additional comments (9)
libs/core/kiln_ai/adapters/test_provider_tools.py (1)

5-5: Import looks correct and scoped to the tests that need it

DockerModelRunnerConnection is used only in the new DMR tests. Good, tight import.

libs/core/kiln_ai/adapters/test_adapter_registry.py (1)

21-23: Nice: default DMR base URL pinned in the mock Config

This makes the DMR tests deterministic and aligned with the implementation default.

libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (2)

90-111: Service “online” checks are well-isolated with AsyncClient patching

Good use of AsyncMock and context manager patching for httpx.AsyncClient.


131-176: Connection success test thoroughly verifies OpenAI client init and parse call

Asserting base_url composition and parse invocation is spot on.

app/desktop/studio_server/test_provider_api.py (5)

349-479: Well-structured test coverage for Docker Model Runner connection logic

The connection test functions comprehensively cover all failure scenarios: invalid URL format, connection failures, API errors (OpenAI APIConnectionError), HTTP errors (httpx.RequestError), and HTTPException preservation. The URL persistence logic is properly tested for both custom and unchanged URLs.


481-523: LGTM! API endpoint tests are well-structured

The API endpoint tests properly verify that the /api/provider/docker_model_runner/connect endpoint correctly invokes connect_docker_model_runner with both default (None) and custom URL parameters.


525-611: Comprehensive test coverage for available Docker Model Runner models

The test suite thoroughly covers Docker Model Runner model discovery, including:

  • Successful connection with both supported and untested models
  • Correct mapping of model attributes (id, name, supports_structured_output, structured_output_mode)
  • Proper handling of untested models with the untested_model flag

613-707: Excellent error handling test coverage

The test suite properly covers all error scenarios for available_docker_model_runner_models:

  • No connection available
  • Empty model lists
  • API errors (openai.APIConnectionError)
  • HTTP errors (httpx.RequestError)
  • HTTPException handling

All error cases correctly return None as expected.


709-784: Thorough test for models_from_docker_model_runner_id function

The test comprehensively validates:

  • Matching Docker Model Runner IDs return correct (KilnModel, KilnModelProvider) tuples
  • Multi-provider models correctly filter to Docker Model Runner provider only
  • Non-matching IDs return empty lists
  • Models without Docker Model Runner provider are excluded

Copy link
Collaborator

@scosman scosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Few tiny changes then good to go:

  • We only support one model provider per provider in ml_model_list (see comment). I should have had a test.
  • one small import fix in the tests

model_id="ai/llama3.2:3B-Q4_K_M",
structured_output_mode=StructuredOutputMode.json_schema,
supports_data_gen=False,
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so we can only have 1 entry per provider in there. I'll add a test to check that. If we add multiple "Llama 8B on DMR" becomes ambiguous which isn't good for reproducibility.

Let's just use the Q4_K_M versions (or similar)? The user can still use the others; they will automatically appear in the dropdown, but with their own more specific IDs.

@ilopezluna ilopezluna requested a review from scosman August 15, 2025 19:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
app/desktop/studio_server/test_generate_docs.py (1)

44-45: Bug: Comparing enum value (str) to enum instance always returns False

This condition will never match because provider.name.value is a string and ModelProviderName.fireworks_ai is an enum. As a result, fireworks finetune support is never detected via the API list.

Fix by comparing enums (or values) consistently.

Apply this diff:

-        if provider.name.value == ModelProviderName.fireworks_ai:
+        if provider.name == ModelProviderName.fireworks_ai:
🧹 Nitpick comments (5)
app/desktop/studio_server/test_generate_docs.py (1)

55-60: Async test should be marked for pytest

async def test_generate_model_table() will error if run without an event loop; it’s currently only marked as paid. Add @pytest.mark.asyncio so it runs correctly when explicitly selected.

Apply this diff:

 @pytest.mark.paid(reason="Marking as paid so it isn't run by default")
+@pytest.mark.asyncio
 async def test_generate_model_table():
libs/core/kiln_ai/adapters/ml_model_list.py (2)

2410-2415: R1-style “thinking” models on DMR: verify parser requirements

Some DMR reasoning-capable entries don’t set an R1 parser, while analogous providers (e.g., OpenRouter/Fireworks/Together) do:

  • Qwen3 0.6B (DMR) — reasoning_capable=True, no parser
  • Qwen3 8B (DMR) — reasoning_capable=True, no parser
  • Qwen3 14B (DMR) — reasoning_capable=True, no parser
  • Qwen3 30B A3B (DMR) — reasoning_capable=True, no parser

Conversely:

  • QwQ 32B (DMR) sets parser=r1_thinking (consistent with other providers)
  • DeepSeek R1 Distill Llama 70B/8B (DMR) set parser=r1_thinking

Given past learnings: parser usage for R1-style output is provider-specific and must be backed by observed responses. Please confirm DMR’s OpenAI-compatible responses for these Qwen3 variants:

  • If DMR returns … (or similar) interleaved content, set parser=r1_thinking.
  • If DMR normalizes outputs without requiring a parser, keep as-is.

Optionally add minimal tests that feed sample DMR responses for these models into the parser layer to lock correctness.

Also applies to: 2549-2554, 2620-2625, 2672-2676


3277-3283: Avoid mutable default argument in utility function

disallowed_modes=[] is a mutable default. Use None and initialize inside the function.

Apply this diff:

 def default_structured_output_mode_for_model_provider(
     model_name: str,
     provider: ModelProviderName,
     default: StructuredOutputMode = StructuredOutputMode.default,
-    disallowed_modes: List[StructuredOutputMode] = [],
+    disallowed_modes: List[StructuredOutputMode] | None = None,
 ) -> StructuredOutputMode:
@@
-    try:
+    if disallowed_modes is None:
+        disallowed_modes = []
+
+    try:
         # Convert string to ModelName enum
         model_name_enum = ModelName(model_name)

Also applies to: 3286-3294

libs/core/kiln_ai/adapters/test_ml_model_list.py (2)

193-227: Enforce “one provider entry per model” — solid guardrail; minor refinement

Great addition and aligns with earlier discussion about avoiding ambiguity. Prefer pytest.fail over assert False for clarity.

Apply this diff:

-            assert False, (
-                f"Model {model.name} has duplicate providers:\n"
-                f"Expected: 1 entry per provider\n"
-                f"Found: {len(provider_names)} total entries, {len(unique_provider_names)} unique providers\n"
-                f"Duplicates: {', '.join(duplicate_details)}\n"
-                f"This suggests either:\n"
-                f"1. A bug where the same provider is accidentally duplicated, or\n"
-                f"2. Intentional design where the same provider offers different model variants\n"
-                f"If this is intentional, the test should be updated to allow multiple entries per provider."
-            )
+            pytest.fail(
+                "Model {name} has duplicate providers:\n"
+                "Expected: 1 entry per provider\n"
+                "Found: {total} total entries, {unique} unique providers\n"
+                "Duplicates: {dups}\n"
+                "This suggests either:\n"
+                "1. A bug where the same provider is accidentally duplicated, or\n"
+                "2. Intentional design where the same provider offers different model variants\n"
+                "If this is intentional, the test should be updated to allow multiple entries per provider."
+                .format(
+                    name=model.name,
+                    total=len(provider_names),
+                    unique=len(unique_provider_names),
+                    dups=", ".join(duplicate_details),
+                )
+            )

150-156: Docstring mismatch with implementation

The test says “model with single provider” but uses gpt_4_1_nano, which has multiple providers. The logic is correct (it just grabs the first provider), but the docstring should be updated to avoid confusion.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5769339 and 15b9325.

📒 Files selected for processing (4)
  • app/desktop/studio_server/test_generate_docs.py (1 hunks)
  • libs/core/kiln_ai/adapters/ml_model_list.py (19 hunks)
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py (1 hunks)
  • libs/core/kiln_ai/adapters/test_ml_model_list.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • libs/core/kiln_ai/adapters/test_docker_model_runner_tools.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/*.py: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above

Files:

  • app/desktop/studio_server/test_generate_docs.py
  • libs/core/kiln_ai/adapters/test_ml_model_list.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
**/test_*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/test_*.py: Always use pytest for tests in Python code
Test brevity is important. Use approaches for re-use and brevity including using fixtures for repeated code, and using pytest parameterize for similar tests

Files:

  • app/desktop/studio_server/test_generate_docs.py
  • libs/core/kiln_ai/adapters/test_ml_model_list.py
🧠 Learnings (5)
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

  • app/desktop/studio_server/test_generate_docs.py
  • libs/core/kiln_ai/adapters/test_ml_model_list.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:19:20.074Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1736-1741
Timestamp: 2025-08-08T16:19:20.074Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwq_32b with ModelProviderName.siliconflow_cn (model_id "Qwen/QwQ-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via tests/responses that SiliconFlow returns outputs that parse correctly without an R1 parser. Parser usage remains provider-specific and should only be added when validated.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:14:54.346Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:15:20.796Z
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:2434-2439
Timestamp: 2025-08-08T16:15:20.796Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwen_3_8b_no_thinking with ModelProviderName.siliconflow_cn (model_id "Qwen/Qwen3-8B"), the qwen3_style_no_think formatter is not required: SiliconFlow’s non‑thinking Qwen3‑8B returns clean output without <answer> wrappers. Formatter/parser usage is provider-specific and should be added only when verified by tests/responses.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
🧬 Code Graph Analysis (2)
app/desktop/studio_server/test_generate_docs.py (1)
libs/core/kiln_ai/adapters/ml_model_list.py (1)
  • KilnModelProvider (190-242)
libs/core/kiln_ai/adapters/ml_model_list.py (3)
libs/core/kiln_ai/datamodel/datamodel_enums.py (2)
  • ModelProviderName (80-102)
  • StructuredOutputMode (23-45)
app/web_ui/src/lib/types.ts (2)
  • ModelProviderName (43-43)
  • StructuredOutputMode (42-42)
libs/core/kiln_ai/adapters/parsers/test_r1_parser.py (1)
  • parser (8-9)
🔇 Additional comments (5)
app/desktop/studio_server/test_generate_docs.py (1)

5-10: Import path consolidation LGTM

Switching to kiln_ai.adapters.* and provider_tools simplifies imports and matches the new adapter layout.

libs/core/kiln_ai/adapters/ml_model_list.py (3)

96-96: New ModelName entry: Gemma 3 270M — LGTM

Consistent naming with existing convention (0p27b) and used below in the model list.


1198-1203: DMR provider additions with explicit model IDs — good coverage and explicitness

  • Clear, explicit tags (weights/quantization) — avoids “latest” ambiguity and supports reproducibility.
  • Broad coverage across Llama, Mistral, Gemma, Phi, Qwen, DeepSeek — aligned with the PR objective.

No functional concerns on these entries as-is.

Also applies to: 1319-1324, 1374-1380, 1406-1411, 1556-1560, 1642-1655, 1669-1673, 1692-1695, 1776-1781, 1844-1850, 1869-1871, 2129-2135, 2209-2217, 2410-2415, 2549-2554, 2620-2625, 2672-2676


1406-1411: Confirm structured output claims for Llama 3.2 3B on DMR

OpenRouter entry for 3B marks structured output as unreliable; DMR sets structured_output_mode=json_schema and omits supports_structured_output=False. If DMR reliably enforces JSON schema for this model, great — if not, consider aligning capabilities (e.g., supports_structured_output=False or json_instruction_and_object) until validated.

libs/core/kiln_ai/adapters/test_ml_model_list.py (1)

1-1: Import for Counter — LGTM

Needed for the new duplicate-provider test.

Copy link
Collaborator

@scosman scosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! We can merge as soon as CI passes.

KilnModel(
family=ModelFamily.gemma,
name=ModelName.gemma_3_0p27b,
friendly_name="Gemma 3 270M",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was fast!

)


def test_unique_providers_per_model():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@scosman scosman merged commit c6824ef into Kiln-AI:main Aug 15, 2025
5 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Sep 10, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants