Skip to content

Commit 28c85b8

Browse files
committed
Update formatting
1 parent fa9bd06 commit 28c85b8

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

site-src/implementations/model-servers.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install the `in
2323

2424
Add the following to the `flags` in the helm chart as [flags to EPP](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/29ea29028496a638b162ff287c62c0087211bbe5/config/charts/inferencepool/values.yaml#L36)
2525

26+
2627
```
2728
- --total-queued-requests-metric
2829
- "nv_trt_llm_request_metrics{request_type=waiting}"
@@ -32,10 +33,12 @@ Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install the `in
3233
- "" # Set an empty metric to disable LoRA metric scraping as they are not supported by Triton yet.
3334
```
3435

36+
3537
## SGLang
3638

3739
Add the following `flags` while deploying using helm charts in the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/29ea29028496a638b162ff287c62c0087211bbe5/config/charts/inferencepool/values.yaml#L36)
3840

41+
3942
```
4043
- --totalQueuedRequestsMetric
4144
- "sglang:num_queue_reqs"

0 commit comments

Comments
 (0)