NVIDIA Safety Provider Calling Wrong Guardrails Endpoint

### System Info

- **LlamaStack Version:** 0.4.0.dev0
- **Distribution:** nvidia 
- **Provider:** `remote::nvidia` safety provider
- **Guardrails Service:** NeMo Guardrails 0.10.x

[ISSUE-nvidia-safety-provider-bug.md](https://github.com/user-attachments/files/23631827/ISSUE-nvidia-safety-provider-bug.md)

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### 🐛 Describe the bug

The NVIDIA safety provider in LlamaStack is calling the wrong endpoint when communicating with NeMo Guardrails service, causing safety/shield functionality to fail with 500 Internal Server Error.

## Steps to Reproduce

1. Configure LlamaStack with nvidia safety provider:
```yaml
providers:
  safety:
    - provider_id: nvidia
      provider_type: remote::nvidia
      config:
        guardrails_service_url: http://nemoguardrails-sample:8000
        config_id: demo-self-check-input-output
        model: meta/llama-3.2-1b-instruct
```

2. Register a shield:
```bash
curl -X POST http://localhost:8321/v1/shields \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "demo-self-check-input-output",
    "provider_id": "nvidia",
    "provider_shield_id": "demo-self-check-input-output",
    "params": {"model": "meta/llama-3.2-1b-instruct"}
  }'
```

3. Create guardrails config in NeMo Guardrails service:
```bash
curl -X POST http://guardrails-service:8000/v1/guardrail/configs \
  -H "Content-Type: application/json" \
  -d '{
    "name": "demo-self-check-input-output",
    "namespace": "default",
    "data": {
      "prompts": [...],
      "rails": {...}
    }
  }'
```

4. Try to run shield via LlamaStack API:
```bash
curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "demo-self-check-input-output",
    "messages": [{"role": "user", "content": "You are stupid"}]
  }'
```

### Error logs

ERROR    2025-11-19 15:18:45,931 llama_stack.core.server.server:285 core::server: Error executing endpoint              
         route='/v1/safety/run-shield' method='post'                                                                    
         â•­â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ Traceback (most recent call last) â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
         â”‚ /workspace/src/llama_stack/core/server/server.py:275 in route_handler                                       â”‚
         â”‚                                                                                                             â”‚
         â”‚   272 â”‚   â”‚   â”‚   â”‚   â”‚   return StreamingResponse(gen, media_type="text/event-stream")                     â”‚
         â”‚   273 â”‚   â”‚   â”‚   â”‚   else:                                                                                 â”‚
         â”‚   274 â”‚   â”‚   â”‚   â”‚   â”‚   value = func(**kwargs)                                                            â”‚
         â”‚ â± 275 â”‚   â”‚   â”‚   â”‚   â”‚   result = await maybe_await(value)                                                 â”‚
         â”‚   276 â”‚   â”‚   â”‚   â”‚   â”‚   if isinstance(result, PaginatedResponse) and result.url is None:                  â”‚
         â”‚   277 â”‚   â”‚   â”‚   â”‚   â”‚   â”‚   result.url = route                                                            â”‚
         â”‚   278                                                                                                       â”‚
         â”‚                                                                                                             â”‚
         â”‚ /workspace/src/llama_stack/core/server/server.py:197 in maybe_await                                         â”‚
         â”‚                                                                                                             â”‚
         â”‚   194                                                                                                       â”‚
         â”‚   195 async def maybe_await(value):                                                                         â”‚
         â”‚   196 â”‚   if inspect.iscoroutine(value):                                                                    â”‚
         â”‚ â± 197 â”‚   â”‚   return await value                                                                            â”‚
         â”‚   198 â”‚   return value                                                                                      â”‚
         â”‚   199                                                                                                       â”‚
         â”‚   200                                                                                                       â”‚
         â”‚                                                                                                             â”‚
         â”‚ /workspace/src/llama_stack/core/telemetry/trace_protocol.py:103 in async_wrapper                            â”‚
         â”‚                                                                                                             â”‚
         â”‚   100 â”‚   â”‚   â”‚                                                                                             â”‚
         â”‚   101 â”‚   â”‚   â”‚   with tracing.span(f"{class_name}.{method_name}", span_attributes) as span:                â”‚
         â”‚   102 â”‚   â”‚   â”‚   â”‚   try:                                                                                  â”‚
         â”‚ â± 103 â”‚   â”‚   â”‚   â”‚   â”‚   result = await method(self, *args, **kwargs)                                      â”‚
         â”‚   104 â”‚   â”‚   â”‚   â”‚   â”‚   span.set_attribute("output", serialize_value(result))                             â”‚
         â”‚   105 â”‚   â”‚   â”‚   â”‚   â”‚   return result                                                                     â”‚
         â”‚   106 â”‚   â”‚   â”‚   â”‚   except Exception as e:                                                                â”‚
         â”‚                                                                                                             â”‚
         â”‚ /workspace/src/llama_stack/core/routers/safety.py:60 in run_shield                                          â”‚
         â”‚                                                                                                             â”‚
         â”‚    57 â”‚   ) -> RunShieldResponse:                                                                           â”‚
         â”‚    58 â”‚   â”‚   logger.debug(f"SafetyRouter.run_shield: {shield_id}")                                         â”‚
         â”‚    59 â”‚   â”‚   provider = await self.routing_table.get_provider_impl(shield_id)                              â”‚
         â”‚ â±  60 â”‚   â”‚   return await provider.run_shield(                                                             â”‚
         â”‚    61 â”‚   â”‚   â”‚   shield_id=shield_id,                                                                      â”‚
         â”‚    62 â”‚   â”‚   â”‚   messages=messages,                                                                        â”‚
         â”‚    63 â”‚   â”‚   â”‚   params=params,                                                                            â”‚
         â”‚                                                                                                             â”‚
         â”‚ /workspace/src/llama_stack/core/telemetry/trace_protocol.py:103 in async_wrapper                            â”‚
         â”‚                                                                                                             â”‚
         â”‚   100 â”‚   â”‚   â”‚                                                                                             â”‚
         â”‚   101 â”‚   â”‚   â”‚   with tracing.span(f"{class_name}.{method_name}", span_attributes) as span:                â”‚
         â”‚   102 â”‚   â”‚   â”‚   â”‚   try:                                                                                  â”‚
         â”‚ â± 103 â”‚   â”‚   â”‚   â”‚   â”‚   result = await method(self, *args, **kwargs)                                      â”‚
         â”‚   104 â”‚   â”‚   â”‚   â”‚   â”‚   span.set_attribute("output", serialize_value(result))                             â”‚
         â”‚   105 â”‚   â”‚   â”‚   â”‚   â”‚   return result                                                                     â”‚
         â”‚   106 â”‚   â”‚   â”‚   â”‚   except Exception as e:                                                                â”‚
         â”‚                                                                                                             â”‚
         â”‚ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:67 in run_shield                        â”‚
         â”‚                                                                                                             â”‚
         â”‚    64 â”‚   â”‚   â”‚   raise ValueError(f"Shield {shield_id} not found")                                         â”‚
         â”‚    65 â”‚   â”‚                                                                                                 â”‚
         â”‚    66 â”‚   â”‚   self.shield = NeMoGuardrails(self.config, shield.shield_id)                                   â”‚
         â”‚ â±  67 â”‚   â”‚   return await self.shield.run(messages)                                                        â”‚
         â”‚    68 â”‚                                                                                                     â”‚
         â”‚    69 â”‚   async def run_moderation(self, input: str | list[str], model: str | None = None) ->               â”‚
         â”‚       ModerationObject:                                                                                     â”‚
         â”‚    70 â”‚   â”‚   raise NotImplementedError("NVIDIA safety provider currently does not implement                â”‚
         â”‚       run_moderation")                                                                                      â”‚
         â”‚                                                                                                             â”‚
         â”‚ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:147 in run                              â”‚
         â”‚                                                                                                             â”‚
         â”‚   144 â”‚   â”‚   â”‚   â”‚   "config_id": self.config_id,                                                          â”‚
         â”‚   145 â”‚   â”‚   â”‚   },                                                                                        â”‚
         â”‚   146 â”‚   â”‚   }                                                                                             â”‚
         â”‚ â± 147 â”‚   â”‚   response = await self._guardrails_post(path="/v1/guardrail/checks",                           â”‚
         â”‚       data=request_data)                                                                                    â”‚
         â”‚   148 â”‚   â”‚                                                                                                 â”‚
         â”‚   149 â”‚   â”‚   if response["status"] == "blocked":                                                           â”‚
         â”‚   150 â”‚   â”‚   â”‚   user_message = "Sorry I cannot do this."                                                  â”‚
         â”‚                                                                                                             â”‚
         â”‚ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:117 in _guardrails_post                 â”‚
         â”‚                                                                                                             â”‚
         â”‚   114 â”‚   â”‚   â”‚   "Accept": "application/json",                                                             â”‚
         â”‚   115 â”‚   â”‚   }                                                                                             â”‚
         â”‚   116 â”‚   â”‚   response = requests.post(url=f"{self.guardrails_service_url}{path}",                          â”‚
         â”‚       headers=headers, json=data)                                                                           â”‚
         â”‚ â± 117 â”‚   â”‚   response.raise_for_status()                                                                   â”‚
         â”‚   118 â”‚   â”‚   return response.json()                                                                        â”‚
         â”‚   119 â”‚                                                                                                     â”‚
         â”‚   120 â”‚   async def run(self, messages: list[OpenAIMessageParam]) -> RunShieldResponse:                     â”‚
         â”‚                                                                                                             â”‚
         â”‚ /usr/local/lib/python3.12/site-packages/requests/models.py:1026 in raise_for_status                         â”‚
         â”‚                                                                                                             â”‚
         â”‚   1023 â”‚   â”‚   â”‚   )                                                                                        â”‚
         â”‚   1024 â”‚   â”‚                                                                                                â”‚
         â”‚   1025 â”‚   â”‚   if http_error_msg:                                                                           â”‚
         â”‚ â± 1026 â”‚   â”‚   â”‚   raise HTTPError(http_error_msg, response=self)                                           â”‚
         â”‚   1027 â”‚                                                                                                    â”‚
         â”‚   1028 â”‚   def close(self):                                                                                 â”‚
         â”‚   1029 â”‚   â”‚   """Releases the connection back to the pool. Once this method has been                       â”‚
         â•°â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•¯
         HTTPError: 500 Server Error: Internal Server Error for url:                                                    
         http://nemoguardrails-sample.hacohen-nemo.svc.cluster.local:8000/v1/guardrail/checks                           


### Expected behavior

The safety provider should successfully communicate with the NeMo Guardrails service and return a safety response indicating whether the content should be blocked.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NVIDIA Safety Provider Calling Wrong Guardrails Endpoint #4189

System Info

Information

🐛 Describe the bug

Steps to Reproduce

Error logs

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NVIDIA Safety Provider Calling Wrong Guardrails Endpoint #4189

Description

System Info

Information

🐛 Describe the bug

Steps to Reproduce

Error logs

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions