|
1 | 1 | ## Quickstart
|
2 | 2 |
|
| 3 | +This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running! |
| 4 | + |
3 | 5 | ### Requirements
|
4 | 6 | - Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
|
5 |
| - - A cluster that has built-in support for `ServiceType=LoadBalancer`. |
| 7 | + - A cluster that has built-in support for `ServiceType=LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running) |
6 | 8 | - For example, with Kind, you can follow these steps: https://kind.sigs.k8s.io/docs/user/loadbalancer
|
7 | 9 |
|
8 | 10 | ### Steps
|
|
22 | 24 | make install
|
23 | 25 | ```
|
24 | 26 |
|
| 27 | + Alternatively, you can run: |
| 28 | + ```sh |
| 29 | + kubectl apply -f config/crd/bases |
| 30 | + ``` |
| 31 | + |
25 | 32 | 1. **Deploy InferenceModel and InferencePool**
|
26 | 33 |
|
27 | 34 | Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
|
|
43 | 50 | ```bash
|
44 | 51 | kubectl apply -f ./manifests/gateway/gateway.yaml
|
45 | 52 | ```
|
46 |
| - > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. |
| 53 | + > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.*** |
47 | 54 |
|
48 | 55 |
|
49 |
| - Should you wish to experiment on the same gateway, a new `Backend` and `HTTPRoute` will need to be created per route/pool you would like. |
| 56 | + |
50 | 57 |
|
51 | 58 | 1. **Deploy Ext-Proc**
|
52 | 59 |
|
|
57 | 64 | 1. **Deploy Envoy Gateway Custom Policies**
|
58 | 65 |
|
59 | 66 | ```bash
|
60 |
| - kubectl apply -f ./manifests/extension_policy.yaml |
61 |
| - kubectl apply -f ./manifests/patch_policy.yaml |
| 67 | + kubectl apply -f ./manifests/gateway/extension_policy.yaml |
| 68 | + kubectl apply -f ./manifests/gateway/patch_policy.yaml |
62 | 69 | ```
|
63 | 70 | > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
|
64 | 71 |
|
|
84 | 91 | "max_tokens": 100,
|
85 | 92 | "temperature": 0
|
86 | 93 | }'
|
87 |
| - ``` |
88 |
| - |
89 |
| -## Scheduling Package in Ext Proc |
90 |
| -The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request. |
91 |
| - |
92 |
| -# Flowchart |
93 |
| -<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" /> |
| 94 | + ``` |
0 commit comments