Skip to content

Commit 799fc49

Browse files
committed
restructuring and feedback comments
1 parent b3fa0a2 commit 799fc49

File tree

4 files changed

+18
-12
lines changed

4 files changed

+18
-12
lines changed

pkg/README.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
## Quickstart
22

3+
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
4+
35
### Requirements
46
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
5-
- A cluster that has built-in support for `ServiceType=LoadBalancer`.
7+
- A cluster that has built-in support for `ServiceType=LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running)
68
- For example, with Kind, you can follow these steps: https://kind.sigs.k8s.io/docs/user/loadbalancer
79

810
### Steps
@@ -22,6 +24,11 @@
2224
make install
2325
```
2426

27+
Alternatively, you can run:
28+
```sh
29+
kubectl apply -f config/crd/bases
30+
```
31+
2532
1. **Deploy InferenceModel and InferencePool**
2633

2734
Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
@@ -43,10 +50,10 @@
4350
```bash
4451
kubectl apply -f ./manifests/gateway/gateway.yaml
4552
```
46-
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file.
53+
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
4754
4855

49-
Should you wish to experiment on the same gateway, a new `Backend` and `HTTPRoute` will need to be created per route/pool you would like.
56+
5057

5158
1. **Deploy Ext-Proc**
5259

@@ -57,8 +64,8 @@
5764
1. **Deploy Envoy Gateway Custom Policies**
5865

5966
```bash
60-
kubectl apply -f ./manifests/extension_policy.yaml
61-
kubectl apply -f ./manifests/patch_policy.yaml
67+
kubectl apply -f ./manifests/gateway/extension_policy.yaml
68+
kubectl apply -f ./manifests/gateway/patch_policy.yaml
6269
```
6370
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
6471
@@ -84,10 +91,4 @@
8491
"max_tokens": 100,
8592
"temperature": 0
8693
}'
87-
```
88-
89-
## Scheduling Package in Ext Proc
90-
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.
91-
92-
# Flowchart
93-
<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />
94+
```
File renamed without changes.
File renamed without changes.

pkg/scheduling.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
## Scheduling Package in Ext Proc
2+
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.
3+
4+
# Flowchart
5+
<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />

0 commit comments

Comments
 (0)