Responses v2 shields #760

luis5tb · 2025-11-05T14:06:02Z

Description

Extend the Responses API support (v2 endpoints) by also adding the option to use shields.

Note there is a limitation in LlamaStack where the same shields must be used for input and output.

Type of change

Related Tickets & Documents

Related Issue Base implementation of non-streaming Responses API #753 and Base implementation of streaming Responses API #754
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Runtime discovery and integration of safety shields into query and streaming flows
- Shield-based guardrails included in API request handling when shields are present
- Shield violation detection that increments validation error metrics and emits warnings
Tests
- Expanded coverage for shield availability, guardrail propagation, and violation detection across query and streaming endpoints

coderabbitai · 2025-11-05T14:06:16Z

Walkthrough

Adds runtime shield discovery to Responses API v2 endpoints: queries available shields at request time, attaches shield IDs as extra_body.guardrails when present, and inspects response output items for refusal messages to increment a validation-error metric; tests expanded to cover shield presence/absence and violation detection.

Changes

Cohort / File(s)	Summary
Endpoints (sync + streaming) `src/app/endpoints/query_v2.py`, `src/app/endpoints/streaming_query_v2.py`	Call `get_available_shields(client)` (wraps `client.shields.list()`), include `extra_body.guardrails` with shield IDs when creating Responses API requests, and run `detect_shield_violations` against response output items to increment `llm_calls_validation_errors_total` on refusal detections; minor docstring/log updates.
Shield utilities `src/utils/shields.py`	New module exposing `get_available_shields(client: AsyncLlamaStackClient) -> list[str]` and `detect_shield_violations(output_items: list[Any]) -> bool` with logging and metric increment on detected refusals.
Unit tests (query v2) `tests/unit/app/endpoints/test_query_v2.py`	Add/mocks for `shields.list()` across tests; new tests asserting guardrail propagation when shields present, absence when none, and metric increment on detected shield refusal; adjust existing mocks to avoid false negatives.
Unit tests (streaming v2) `tests/unit/app/endpoints/test_streaming_query_v2.py`	Mock `shields.list()` in existing tests; new tests for guardrail propagation (present/absent) and streaming-path shield violation detection (metric increment) and non-violation cases; adjust streaming mocks and SSE sequencing in tests.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant H as Endpoint Handler
    participant S as Shields util
    participant R as Responses API
    participant P as Processor

    C->>H: QueryRequest v2

    rect rgb(240, 250, 240)
    Note over H,S: Discover shields
    H->>S: get_available_shields(client)
    S-->>H: [shield_id,...] or []
    end

    rect rgb(240, 240, 255)
    Note over H,R: Create Responses request
    alt Shields present
        H->>R: create(..., extra_body: { guardrails: [ids] })
    else No shields
        H->>R: create(..., no extra_body.guardrails)
    end
    end

    R-->>H: response (output items...)

    rect rgb(255, 250, 230)
    Note over H,P: Inspect outputs for refusals
    loop per output item
        alt message with refusal
            P->>P: detect_shield_violations -> increment llm_calls_validation_errors_total
            P->>H: log shield violation
        else
            P->>H: normal processing
        end
    end
    end

    H-->>C: Final response / stream

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pay attention to error handling from client.shields.list() and how failures are logged/propagated.
Verify extra_body.guardrails shape matches Responses API expectations.
Confirm detect_shield_violations reliably identifies refusal messages without false positives.
Review tests for correct mocking of client.shields.list() and metrics assertions.

Possibly related PRs

[RHDHPAI-1150] create /v1/shields endpoint #628 — Adds the /v1/shields endpoint and schema consumed by client.shields.list(); strongly related to runtime shield discovery and identifiers used here.

Suggested reviewers

tisnik
manstis

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is vague and uses work-in-progress indicator without providing clear specifics about the main change.	Consider a more descriptive title that clearly explains the feature, such as 'Add shield support to Responses v2 endpoints' or 'Enable shields integration in Responses API v2'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 92.59% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/query_v2.py (1)

468-558: Prevent double-counting llm_calls_total.

extract_token_usage_from_responses_api always bumps llm_calls_total, but the shared streaming base already increments that metric before yielding the SSE stream. For the Response API path this results in every request being counted twice. Please make the metric increment optional so we can disable it for streaming callers while keeping the existing behavior for synchronous flows.

-def extract_token_usage_from_responses_api(
-    response: OpenAIResponseObject,
-    model: str,
-    provider: str,
-    system_prompt: str = "",  # pylint: disable=unused-argument
-) -> TokenCounter:
+def extract_token_usage_from_responses_api(
+    response: OpenAIResponseObject,
+    model: str,
+    provider: str,
+    system_prompt: str = "",  # pylint: disable=unused-argument
+    *,
+    increment_llm_call_metric: bool = True,
+) -> TokenCounter:
@@
-                # Update Prometheus metrics only when we have actual usage data
-                try:
-                    metrics.llm_token_sent_total.labels(provider, model).inc(
-                        token_counter.input_tokens
-                    )
-                    metrics.llm_token_received_total.labels(provider, model).inc(
-                        token_counter.output_tokens
-                    )
-                except (AttributeError, TypeError, ValueError) as e:
-                    logger.warning("Failed to update token metrics: %s", e)
-                _increment_llm_call_metric(provider, model)
+                # Update Prometheus metrics only when we have actual usage data
+                try:
+                    metrics.llm_token_sent_total.labels(provider, model).inc(
+                        token_counter.input_tokens
+                    )
+                    metrics.llm_token_received_total.labels(provider, model).inc(
+                        token_counter.output_tokens
+                    )
+                except (AttributeError, TypeError, ValueError) as e:
+                    logger.warning("Failed to update token metrics: %s", e)
+                if increment_llm_call_metric:
+                    _increment_llm_call_metric(provider, model)
@@
-                _increment_llm_call_metric(provider, model)
+                if increment_llm_call_metric:
+                    _increment_llm_call_metric(provider, model)
@@
-        _increment_llm_call_metric(provider, model)
+        if increment_llm_call_metric:
+            _increment_llm_call_metric(provider, model)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f39c33 and fb25b8c.

📒 Files selected for processing (8)

src/app/endpoints/query_v2.py (7 hunks)
src/app/endpoints/streaming_query.py (6 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (2 hunks)
src/models/context.py (1 hunks)
tests/unit/app/endpoints/test_query_v2.py (9 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
tests/unit/app/test_routers.py (5 hunks)

🧰 Additional context used

📓 Path-based instructions (9)

src/**/*.py