-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
System Info
- LlamaStack Version: 0.4.0.dev0
- Distribution: nvidia
- Provider:
remote::nvidiasafety provider - Guardrails Service: NeMo Guardrails 0.10.x
ISSUE-nvidia-safety-provider-bug.md
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
The NVIDIA safety provider in LlamaStack is calling the wrong endpoint when communicating with NeMo Guardrails service, causing safety/shield functionality to fail with 500 Internal Server Error.
Steps to Reproduce
- Configure LlamaStack with nvidia safety provider:
providers:
safety:
- provider_id: nvidia
provider_type: remote::nvidia
config:
guardrails_service_url: http://nemoguardrails-sample:8000
config_id: demo-self-check-input-output
model: meta/llama-3.2-1b-instruct- Register a shield:
curl -X POST http://localhost:8321/v1/shields \
-H "Content-Type: application/json" \
-d '{
"shield_id": "demo-self-check-input-output",
"provider_id": "nvidia",
"provider_shield_id": "demo-self-check-input-output",
"params": {"model": "meta/llama-3.2-1b-instruct"}
}'- Create guardrails config in NeMo Guardrails service:
curl -X POST http://guardrails-service:8000/v1/guardrail/configs \
-H "Content-Type: application/json" \
-d '{
"name": "demo-self-check-input-output",
"namespace": "default",
"data": {
"prompts": [...],
"rails": {...}
}
}'- Try to run shield via LlamaStack API:
curl -X POST http://localhost:8321/v1/safety/run-shield \
-H "Content-Type: application/json" \
-d '{
"shield_id": "demo-self-check-input-output",
"messages": [{"role": "user", "content": "You are stupid"}]
}'Error logs
ERROR 2025-11-19 15:18:45,931 llama_stack.core.server.server:285 core::server: Error executing endpoint
route='/v1/safety/run-shield' method='post'
â•───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /workspace/src/llama_stack/core/server/server.py:275 in route_handler │
│ │
│ 272 │ │ │ │ │ return StreamingResponse(gen, media_type="text/event-stream") │
│ 273 │ │ │ │ else: │
│ 274 │ │ │ │ │ value = func(**kwargs) │
│ � 275 │ │ │ │ │ result = await maybe_await(value) │
│ 276 │ │ │ │ │ if isinstance(result, PaginatedResponse) and result.url is None: │
│ 277 │ │ │ │ │ │ result.url = route │
│ 278 │
│ │
│ /workspace/src/llama_stack/core/server/server.py:197 in maybe_await │
│ │
│ 194 │
│ 195 async def maybe_await(value): │
│ 196 │ if inspect.iscoroutine(value): │
│ � 197 │ │ return await value │
│ 198 │ return value │
│ 199 │
│ 200 │
│ │
│ /workspace/src/llama_stack/core/telemetry/trace_protocol.py:103 in async_wrapper │
│ │
│ 100 │ │ │ │
│ 101 │ │ │ with tracing.span(f"{class_name}.{method_name}", span_attributes) as span: │
│ 102 │ │ │ │ try: │
│ � 103 │ │ │ │ │ result = await method(self, *args, **kwargs) │
│ 104 │ │ │ │ │ span.set_attribute("output", serialize_value(result)) │
│ 105 │ │ │ │ │ return result │
│ 106 │ │ │ │ except Exception as e: │
│ │
│ /workspace/src/llama_stack/core/routers/safety.py:60 in run_shield │
│ │
│ 57 │ ) -> RunShieldResponse: │
│ 58 │ │ logger.debug(f"SafetyRouter.run_shield: {shield_id}") │
│ 59 │ │ provider = await self.routing_table.get_provider_impl(shield_id) │
│ � 60 │ │ return await provider.run_shield( │
│ 61 │ │ │ shield_id=shield_id, │
│ 62 │ │ │ messages=messages, │
│ 63 │ │ │ params=params, │
│ │
│ /workspace/src/llama_stack/core/telemetry/trace_protocol.py:103 in async_wrapper │
│ │
│ 100 │ │ │ │
│ 101 │ │ │ with tracing.span(f"{class_name}.{method_name}", span_attributes) as span: │
│ 102 │ │ │ │ try: │
│ � 103 │ │ │ │ │ result = await method(self, *args, **kwargs) │
│ 104 │ │ │ │ │ span.set_attribute("output", serialize_value(result)) │
│ 105 │ │ │ │ │ return result │
│ 106 │ │ │ │ except Exception as e: │
│ │
│ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:67 in run_shield │
│ │
│ 64 │ │ │ raise ValueError(f"Shield {shield_id} not found") │
│ 65 │ │ │
│ 66 │ │ self.shield = NeMoGuardrails(self.config, shield.shield_id) │
│ � 67 │ │ return await self.shield.run(messages) │
│ 68 │ │
│ 69 │ async def run_moderation(self, input: str | list[str], model: str | None = None) -> │
│ ModerationObject: │
│ 70 │ │ raise NotImplementedError("NVIDIA safety provider currently does not implement │
│ run_moderation") │
│ │
│ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:147 in run │
│ │
│ 144 │ │ │ │ "config_id": self.config_id, │
│ 145 │ │ │ }, │
│ 146 │ │ } │
│ � 147 │ │ response = await self._guardrails_post(path="/v1/guardrail/checks", │
│ data=request_data) │
│ 148 │ │ │
│ 149 │ │ if response["status"] == "blocked": │
│ 150 │ │ │ user_message = "Sorry I cannot do this." │
│ │
│ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:117 in _guardrails_post │
│ │
│ 114 │ │ │ "Accept": "application/json", │
│ 115 │ │ } │
│ 116 │ │ response = requests.post(url=f"{self.guardrails_service_url}{path}", │
│ headers=headers, json=data) │
│ � 117 │ │ response.raise_for_status() │
│ 118 │ │ return response.json() │
│ 119 │ │
│ 120 │ async def run(self, messages: list[OpenAIMessageParam]) -> RunShieldResponse: │
│ │
│ /usr/local/lib/python3.12/site-packages/requests/models.py:1026 in raise_for_status │
│ │
│ 1023 │ │ │ ) │
│ 1024 │ │ │
│ 1025 │ │ if http_error_msg: │
│ � 1026 │ │ │ raise HTTPError(http_error_msg, response=self) │
│ 1027 │ │
│ 1028 │ def close(self): │
│ 1029 │ │ """Releases the connection back to the pool. Once this method has been │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
HTTPError: 500 Server Error: Internal Server Error for url:
http://nemoguardrails-sample.hacohen-nemo.svc.cluster.local:8000/v1/guardrail/checks
Expected behavior
The safety provider should successfully communicate with the NeMo Guardrails service and return a safety response indicating whether the content should be blocked.