Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions config/charts/inferencepool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,26 @@ Then apply it with:
$ helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
```

### Install with Custom EPP Plugins Configuration

To set custom EPP plugin config, you can pass it as an inline yaml. For example:

```yaml
inferenceExtension:
pluginsCustomConfig:
custom-plugins.yaml: |
apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: EndpointPickerConfig
plugins:
- type: custom-scorer
parameters:
custom-threshold: 64
schedulingProfiles:
- name: default
plugins:
- pluginRef: custom-scorer
```

### Install with Additional Ports

To expose additional ports (e.g., for ZMQ), you can define them in the `values.yaml` file:
Expand Down Expand Up @@ -126,6 +146,7 @@ The following table list the configurable parameters of the chart.
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. |
| `inferenceExtension.flags.has-enable-leader-election` | Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. |
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |

## Notes
Expand Down