From 4b58e5cf21ef4a97d16d78069baa0ce8c1d2e087 Mon Sep 17 00:00:00 2001
From: Kunjan <kunjanp@google.com>
Date: Mon, 10 Feb 2025 18:07:15 -0800
Subject: [PATCH 01/13] Integrate dynamic-lora-sidecar into main guide and add
 makefile, cloudbuild to build and publish lora-syncer image

Signed-off-by: Kunjan <kunjanp@google.com>
---
 site-src/guides/index.md | 68 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/site-src/guides/index.md b/site-src/guides/index.md
index e4cbec6f6..a0d368122 100644
--- a/site-src/guides/index.md
+++ b/site-src/guides/index.md
@@ -19,6 +19,74 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
    kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
    kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
    ```
+   **OPTIONALLY**: Enable Dynamic loading of Lora adapters.
+   
+     [Deploy sample vllm deployment with Dynamic lora adapter enabled and Lora syncer sidecar](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/tools/dynamic-lora-sidecar/deployment.yaml)
+         
+    ***Safely rollout v2 adapter***
+    
+     1. Update lora configmap
+
+        ``` yaml
+
+              apiVersion: v1
+              kind: ConfigMap
+              metadata:
+              name: dynamic-lora-config
+              data:
+              configmap.yaml: |
+                    vLLMLoRAConfig:
+                    ensureExist:   
+                       models:
+                       - id: chatbot-v1
+                          source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
+                       - id: chatbot-v2
+                          source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2     
+         ```
+
+     2. Configure a canary rollout with traffic split using LLMService. In this example, 10% of traffic to the chatbot model will be sent to v2.
+
+        ``` yaml
+        model:
+           name: chatbot
+           targetModels:
+           targetModelName: chatbot-v1
+                 weight: 90
+           targetModelName: chatbot-v2
+                 weight: 10
+        ```
+            
+     3. Finish rollout by setting the traffic to the new version 100%.
+        ```yaml
+        model:
+           name: chatbot
+           targetModels:
+           targetModelName: chatbot-v2
+                 weight: 100
+        ```
+         
+     4. Remove v1 from dynamic lora configmap.
+        ```yaml
+           apiVersion: v1
+           kind: ConfigMap
+           metadata:
+           name: dynamic-lora-config
+           data:
+           configmap.yaml: |
+                 vLLMLoRAConfig:
+                 ensureExist:
+                    models:
+                    - id: chatbot-v2
+                       source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
+                 ensureNotExist: # Explicitly unregisters the adapter from  model servers
+                    models:
+                    - id: chatbot-v1
+                       source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
+        ```
+
+
+
+
 1. **Install the Inference Extension CRDs:**
 
    ```sh

From 985ed8ef8ae2cc33d7e152e1244e625501806f32 Mon Sep 17 00:00:00 2001
From: Kunjan <kunjanp@google.com>
Date: Mon, 10 Feb 2025 18:07:15 -0800
Subject: [PATCH 02/13] Add makefile and cloudbuild file to build and push
 lora-syncer

Signed-off-by: Kunjan <kunjanp@google.com>
---
 .../vllm/deployment-with-syncer.yaml          | 158 ++++++++++++++++++
 pkg/manifests/vllm/deployment.yaml            |  47 ++----
 site-src/guides/dynamic-lora.md               |  79 +++++++++
 site-src/guides/index.md                      |  64 -------
 tools/dynamic-lora-sidecar/Makefile           |  59 +++++++
 tools/dynamic-lora-sidecar/cloudbuild.yaml    |  17 ++
 6 files changed, 325 insertions(+), 99 deletions(-)
 create mode 100644 pkg/manifests/vllm/deployment-with-syncer.yaml
 create mode 100644 site-src/guides/dynamic-lora.md
 create mode 100644 tools/dynamic-lora-sidecar/Makefile
 create mode 100644 tools/dynamic-lora-sidecar/cloudbuild.yaml

diff --git a/pkg/manifests/vllm/deployment-with-syncer.yaml b/pkg/manifests/vllm/deployment-with-syncer.yaml
new file mode 100644
index 000000000..9359123dd
--- /dev/null
+++ b/pkg/manifests/vllm/deployment-with-syncer.yaml
@@ -0,0 +1,158 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: vllm-llama2-7b-pool
+spec:
+  selector:
+    app: vllm-llama2-7b-pool
+  ports:
+  - protocol: TCP
+    port: 8000
+    targetPort: 8000
+  type: ClusterIP
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: vllm-llama2-7b-pool
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: vllm-llama2-7b-pool
+  template:
+    metadata:
+      labels:
+        app: vllm-llama2-7b-pool
+    spec:
+      containers:
+        - name: lora
+          image: "vllm/vllm-openai:latest"
+          imagePullPolicy: Always
+          command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
+          args:
+          - "--model"
+          - "meta-llama/Llama-2-7b-hf"
+          - "--tensor-parallel-size"
+          - "1"
+          - "--port"
+          - "8000"
+          - "--enable-lora"
+          - "--max-loras"
+          - "4"
+          - "--max-cpu-loras"
+          - "12"
+          - "--lora-modules"
+          - '{"name": "sql-lora-0", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora-1", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora-2", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora-3", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora-4", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-0", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-1", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-2", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-3", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-4", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          env:
+            - name: PORT
+              value: "8000"
+            - name: HUGGING_FACE_HUB_TOKEN
+              valueFrom:
+                secretKeyRef:
+                  name: hf-token
+                  key: token
+            - name: VLLM_ALLOW_RUNTIME_LORA_UPDATING
+              value: "true"
+          ports:
+            - containerPort: 8000
+              name: http
+              protocol: TCP
+          livenessProbe:
+            failureThreshold: 240
+            httpGet:
+              path: /health
+              port: http
+              scheme: HTTP
+            initialDelaySeconds: 5
+            periodSeconds: 5
+            successThreshold: 1
+            timeoutSeconds: 1
+          readinessProbe:
+            failureThreshold: 600
+            httpGet:
+              path: /health
+              port: http
+              scheme: HTTP
+            initialDelaySeconds: 5
+            periodSeconds: 5
+            successThreshold: 1
+            timeoutSeconds: 1
+          resources:
+            limits:
+              nvidia.com/gpu: 1
+            requests:
+              nvidia.com/gpu: 1
+          volumeMounts:
+            - mountPath: /data
+              name: data
+            - mountPath: /dev/shm
+              name: shm
+            - name: adapters
+              mountPath: "/adapters"
+      initContainers:
+        - name: lora-adapter-syncer
+          tty: true
+          stdin: true 
+          image: <SIDECAR_IMAGE> #Replace image
+          restartPolicy: Always
+          imagePullPolicy: Always
+          env: 
+            - name: DYNAMIC_LORA_ROLLOUT_CONFIG
+              value: "/config/configmap.yaml"
+          volumeMounts: # DO NOT USE subPath
+          - name: config-volume
+            mountPath:  /config
+      restartPolicy: Always
+      schedulerName: default-scheduler
+      terminationGracePeriodSeconds: 30
+      volumes:
+        - name: data
+          emptyDir: {}
+        - name: shm
+          emptyDir:
+            medium: Memory
+        - name: adapters
+          emptyDir: {}
+        - name: config-volume
+          configMap:
+            name: dynamic-lora-config
+
+---
+
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: dynamic-lora-config
+data:
+  configmap.yaml: |
+      vLLMLoRAConfig:
+        name: sql-loras-llama
+        port: 8000
+        ensureExist:
+          models:
+          - base-model: meta-llama/Llama-2-7b-hf
+            id: sql-lora-v1
+            source: yard1/llama-2-7b-sql-lora-test
+          - base-model: meta-llama/Llama-2-7b-hf
+            id: sql-lora-v3
+            source: yard1/llama-2-7b-sql-lora-test
+          - base-model: meta-llama/Llama-2-7b-hf
+            id: sql-lora-v4
+            source: yard1/llama-2-7b-sql-lora-test
+        ensureNotExist:
+          models:
+          - base-model: meta-llama/Llama-2-7b-hf
+            id: sql-lora-v2
+            source: yard1/llama-2-7b-sql-lora-test
\ No newline at end of file
diff --git a/pkg/manifests/vllm/deployment.yaml b/pkg/manifests/vllm/deployment.yaml
index 4af0891d7..8ea95365b 100644
--- a/pkg/manifests/vllm/deployment.yaml
+++ b/pkg/manifests/vllm/deployment.yaml
@@ -43,18 +43,18 @@ spec:
           - "--max-cpu-loras"
           - "12"
           - "--lora-modules"
-          - "sql-lora=/adapters/hub/models--yard1--llama-2-7b-sql-lora-test/snapshots/0dfa347e8877a4d4ed19ee56c140fa518470028c/"
-          - "tweet-summary=/adapters/hub/models--vineetsharma--qlora-adapter-Llama-2-7b-hf-TweetSumm/snapshots/796337d8e866318c59e38f16416e3ecd11fe5403"
-          - 'sql-lora-0=/adapters/yard1/llama-2-7b-sql-lora-test_0'
-          - 'sql-lora-1=/adapters/yard1/llama-2-7b-sql-lora-test_1'
-          - 'sql-lora-2=/adapters/yard1/llama-2-7b-sql-lora-test_2'
-          - 'sql-lora-3=/adapters/yard1/llama-2-7b-sql-lora-test_3'
-          - 'sql-lora-4=/adapters/yard1/llama-2-7b-sql-lora-test_4'
-          - 'tweet-summary-0=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_0'
-          - 'tweet-summary-1=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_1'
-          - 'tweet-summary-2=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_2'
-          - 'tweet-summary-3=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_3'
-          - 'tweet-summary-4=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_4'
+          - '{"name": "sql-lora-0", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora-1", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora-2", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora-3", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora-4", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-0", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-1", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-2", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-3", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary-4", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
+          - '{"name": "sql-lora", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
+          - '{"name": "tweet-summary", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
           env:
             - name: PORT
               value: "8000"
@@ -99,29 +99,6 @@ spec:
               name: shm
             - name: adapters
               mountPath: "/adapters"
-      initContainers:
-        - name: adapter-loader
-          image: ghcr.io/tomatillo-and-multiverse/adapter-puller:demo
-          command: ["python"]
-          args:
-            - ./pull_adapters.py
-            - --adapter
-            - yard1/llama-2-7b-sql-lora-test
-            - --adapter
-            - vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
-            - --duplicate-count
-            - "5"
-          env:
-            - name: HF_TOKEN 
-              valueFrom:
-                secretKeyRef:
-                  name: hf-token
-                  key: token
-            - name: HF_HOME
-              value: /adapters
-          volumeMounts:
-            - name: adapters
-              mountPath: "/adapters"
       restartPolicy: Always
       schedulerName: default-scheduler
       terminationGracePeriodSeconds: 30
diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
new file mode 100644
index 000000000..5356c7e73
--- /dev/null
+++ b/site-src/guides/dynamic-lora.md
@@ -0,0 +1,79 @@
+# Getting started with Gateway API Inference Extension with Dynamic lora updates on vllm
+
+The goal of this guide is to get a single InferencePool running with VLLM and demonstrate use of dynamic lora updating ! 
+
+### Requirements
+ - Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
+ - A cluster with:
+   - Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind,
+     you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
+   - 3 GPUs to run the sample model server. Adjust the number of replicas in `./manifests/vllm/deployment.yaml` as needed.
+
+### Steps
+
+1. **Deploy Sample VLLM Model Server with dynamic lora update enabled and dynamic lora syncer sidecar **
+    [Deploy sample vllm deployment with Dynamic lora adapter enabled and Lora syncer sidecar and configmap](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/manifests/vllm/dynamic-lora-sidecar/deployment.yaml)
+
+Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/site-src/guides/index.md)
+
+
+### Safely rollout v2 adapter
+    
+1. Update lora configmap
+
+``` yaml
+
+        apiVersion: v1
+        kind: ConfigMap
+        metadata:
+        name: dynamic-lora-config
+        data:
+        configmap.yaml: |
+            vLLMLoRAConfig:
+            ensureExist:   
+                models:
+                - id: tweet-summary-v1
+                    source: tweet-summary-1=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_1
+                - id: tweet-summary-v2
+                    source: tweet-summary-2=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_2
+    ```
+
+2. Configure a canary rollout with traffic split using LLMService. In this example, 10% of traffic to the chatbot model will be sent to v2.
+
+``` yaml
+model:
+    name: chatbot
+    targetModels:
+    targetModelName: chatbot-v1
+            weight: 90
+    targetModelName: chatbot-v2
+            weight: 10
+```
+            
+3. Finish rollout by setting the traffic to the new version 100%.
+```yaml
+model:
+    name: chatbot
+    targetModels:
+    targetModelName: chatbot-v2
+            weight: 100
+```
+    
+4. Remove v1 from dynamic lora configmap.
+```yaml
+    apiVersion: v1
+    kind: ConfigMap
+    metadata:
+    name: dynamic-lora-config
+    data:
+    configmap.yaml: |
+            vLLMLoRAConfig:
+            ensureExist:
+            models:
+            - id: chatbot-v2
+                source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
+            ensureNotExist: # Explicitly unregisters the adapter from  model servers
+            models:
+            - id: chatbot-v1
+                source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
+```
diff --git a/site-src/guides/index.md b/site-src/guides/index.md
index a0d368122..2cc971c61 100644
--- a/site-src/guides/index.md
+++ b/site-src/guides/index.md
@@ -19,70 +19,6 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
    kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
    kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
    ```
-   **OPTIONALLY**: Enable Dynamic loading of Lora adapters.
-   
-     [Deploy sample vllm deployment with Dynamic lora adapter enabled and Lora syncer sidecar](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/tools/dynamic-lora-sidecar/deployment.yaml)
-         
-    ***Safely rollout v2 adapter***
-    
-     1. Update lora configmap
-
-        ``` yaml
-
-              apiVersion: v1
-              kind: ConfigMap
-              metadata:
-              name: dynamic-lora-config
-              data:
-              configmap.yaml: |
-                    vLLMLoRAConfig:
-                    ensureExist:   
-                       models:
-                       - id: chatbot-v1
-                          source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
-                       - id: chatbot-v2
-                          source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2     
-         ```
-
-     2. Configure a canary rollout with traffic split using LLMService. In this example, 10% of traffic to the chatbot model will be sent to v2.
-
-        ``` yaml
-        model:
-           name: chatbot
-           targetModels:
-           targetModelName: chatbot-v1
-                 weight: 90
-           targetModelName: chatbot-v2
-                 weight: 10
-        ```
-            
-     3. Finish rollout by setting the traffic to the new version 100%.
-        ```yaml
-        model:
-           name: chatbot
-           targetModels:
-           targetModelName: chatbot-v2
-                 weight: 100
-        ```
-         
-     4. Remove v1 from dynamic lora configmap.
-        ```yaml
-           apiVersion: v1
-           kind: ConfigMap
-           metadata:
-           name: dynamic-lora-config
-           data:
-           configmap.yaml: |
-                 vLLMLoRAConfig:
-                 ensureExist:
-                    models:
-                    - id: chatbot-v2
-                       source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
-                 ensureNotExist: # Explicitly unregisters the adapter from  model servers
-                    models:
-                    - id: chatbot-v1
-                       source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
-        ```
 
 
 
diff --git a/tools/dynamic-lora-sidecar/Makefile b/tools/dynamic-lora-sidecar/Makefile
new file mode 100644
index 000000000..93f7672d2
--- /dev/null
+++ b/tools/dynamic-lora-sidecar/Makefile
@@ -0,0 +1,59 @@
+IMAGE_NAME := lora-syncer
+IMAGE_REGISTRY ?= us-central1-docker.pkg.dev/k8s-staging-images/llm-instance-gateway
+IMAGE_REPO ?= $(IMAGE_REGISTRY)/$(IMAGE_NAME)
+
+GIT_TAG ?= $(shell git describe --tags --dirty --always)
+EXTRA_TAG ?= $(if $(_PULL_BASE_REF),$(_PULL_BASE_REF),main)
+IMAGE_TAG ?= $(IMAGE_REPO):$(GIT_TAG)
+EXTRA_IMAGE_TAG ?= $(IMAGE_REPO):$(EXTRA_TAG)
+
+
+PLATFORMS ?= linux/amd64
+
+
+DOCKER_BUILDX_CMD ?= docker buildx
+IMAGE_BUILD_CMD ?= $(DOCKER_BUILDX_CMD) build
+IMAGE_BUILD_EXTRA_OPTS ?=
+
+# --- Targets ---
+.PHONY: image-local-build
+image-local-build:
+	BUILDER=$(shell $(DOCKER_BUILDX_CMD) create --use)
+	$(MAKE) image-build PUSH=$(PUSH)
+	$(DOCKER_BUILDX_CMD) rm $$BUILDER
+
+.PHONY: image-local-push
+image-local-push: PUSH=--push
+image-local-push: image-local-build
+
+.PHONY: image-build
+image-build:
+	$(IMAGE_BUILD_CMD) -t $(IMAGE_TAG) \
+		--platform=$(PLATFORMS) \
+		--build-arg BASE_IMAGE=$(BASE_IMAGE) \
+		--build-arg BUILDER_IMAGE=$(BUILDER_IMAGE) \
+		$(PUSH) \
+		$(IMAGE_BUILD_EXTRA_OPTS) ./
+
+.PHONY: image-push
+image-push: PUSH=--push
+image-push: image-build
+
+.PHONY: run
+run:
+	docker run -v $(CURDIR)/config:/config -u appuser $(IMAGE_TAG) # Use the user name
+
+.PHONY: clean
+clean:
+	docker rmi $(IMAGE_TAG) $(EXTRA_IMAGE_TAG) 2>/dev/null || true
+
+.PHONY: clean-dangling
+clean-dangling:
+	docker rmi $(docker images -f "dangling=true" -q) 2>/dev/null || true
+
+.PHONY: test
+test:
+	python -m unittest discover
+
+.PHONY: all
+all: test image-build
\ No newline at end of file
diff --git a/tools/dynamic-lora-sidecar/cloudbuild.yaml b/tools/dynamic-lora-sidecar/cloudbuild.yaml
new file mode 100644
index 000000000..e91a238a6
--- /dev/null
+++ b/tools/dynamic-lora-sidecar/cloudbuild.yaml
@@ -0,0 +1,17 @@
+# See https://cloud.google.com/cloud-build/docs/build-config
+timeout: 3000s
+
+steps:
+  - name: gcr.io/k8s-testimages/gcb-docker-gcloud:v20220830-45cbff55bc
+    entrypoint: make
+    args:
+      - image-push
+    env:
+      - GIT_TAG=$_GIT_TAG
+      - EXTRA_TAG=$_PULL_BASE_REF
+      - DOCKER_BUILDX_CMD=/buildx-entrypoint
+
+substitutions:
+  _GIT_TAG: '0.0.0'       # Default value for Git tag
+  _PULL_BASE_REF: 'main'   # Default value for branch/tag
+# No options needed!
\ No newline at end of file

From 03b274136525bcdd138984d104840f0dbd49f85d Mon Sep 17 00:00:00 2001
From: Kunjan <kunjanp@google.com>
Date: Mon, 10 Feb 2025 18:07:15 -0800
Subject: [PATCH 03/13] Add makefile and cloudbuild file to build and push
 lora-syncer

Signed-off-by: Kunjan <kunjanp@google.com>
---
 Makefile                                   | 29 ++++++++++++++++++++++
 cloudbuild.yaml                            |  8 ++++++
 tools/dynamic-lora-sidecar/cloudbuild.yaml | 17 -------------
 3 files changed, 37 insertions(+), 17 deletions(-)
 delete mode 100644 tools/dynamic-lora-sidecar/cloudbuild.yaml

diff --git a/Makefile b/Makefile
index b7654ed71..f2198844c 100644
--- a/Makefile
+++ b/Makefile
@@ -31,6 +31,10 @@ IMAGE_NAME := epp
 IMAGE_REPO ?= $(IMAGE_REGISTRY)/$(IMAGE_NAME)
 IMAGE_TAG ?= $(IMAGE_REPO):$(GIT_TAG)
 
+SYNCER_IMAGE_NAME := lora-syncer
+SYNCER_IMAGE_REPO ?= $(IMAGE_REGISTRY)/$(IMAGE_NAME)
+SYNCER_IMAGE_TAG ?= $(IMAGE_REPO):$(GIT_TAG)
+
 BASE_IMAGE ?= gcr.io/distroless/base-debian10
 BUILDER_IMAGE ?= golang:1.23-alpine
 ifdef GO_VERSION
@@ -163,6 +167,31 @@ image-build: ## Build the EPP image using Docker Buildx.
 image-push: PUSH=--push ## Build the EPP image and push it to $IMAGE_REPO.
 image-push: image-build
 
+##@ Lora Syncer
+
+.PHONY: syncer-image-local-build
+syncer-image-local-build:
+	BUILDER=$(shell $(DOCKER_BUILDX_CMD) create --use)
+	$(MAKE) image-build PUSH=$(PUSH)
+	$(DOCKER_BUILDX_CMD) rm $$BUILDER
+
+.PHONY: syncer-image-local-push
+syncer-image-local-push: PUSH=--push
+syncer-image-local-push: syncer-image-local-build
+
+.PHONY: syncer-image-build
+syncer-image-build:
+	$ cd $(CURDIR)/tools/dynamic-lora-sidecar && $(IMAGE_BUILD_CMD) -t $(SYNCER_IMAGE_TAG) \
+		--platform=$(PLATFORMS) \
+		--build-arg BASE_IMAGE=$(BASE_IMAGE) \
+		--build-arg BUILDER_IMAGE=$(BUILDER_IMAGE) \
+		$(PUSH) \
+		$(IMAGE_BUILD_EXTRA_OPTS) ./
+
+.PHONY: syncer-image-push
+syncer-image-push: PUSH=--push
+syncer-image-push: syncer-image-build
+
 .PHONY: image-load
 image-load: LOAD=--load ## Build the EPP image and load it in the local Docker registry.
 image-load: image-build
diff --git a/cloudbuild.yaml b/cloudbuild.yaml
index 2da147f4a..40e45923e 100644
--- a/cloudbuild.yaml
+++ b/cloudbuild.yaml
@@ -12,6 +12,14 @@ steps:
     - GIT_TAG=$_GIT_TAG
     - EXTRA_TAG=$_PULL_BASE_REF
     - DOCKER_BUILDX_CMD=/buildx-entrypoint
+  - name: lora-adapter-syncer
+    entrypoint: make
+    args:
+      - syncer-image-push
+    env:
+    - GIT_TAG=$_GIT_TAG
+    - EXTRA_TAG=$_PULL_BASE_REF
+    - DOCKER_BUILDX_CMD=/buildx-entrypoint
 substitutions:
   # _GIT_TAG will be filled with a git-based tag for the image, of the form vYYYYMMDD-hash, and
   # can be used as a substitution
diff --git a/tools/dynamic-lora-sidecar/cloudbuild.yaml b/tools/dynamic-lora-sidecar/cloudbuild.yaml
deleted file mode 100644
index e91a238a6..000000000
--- a/tools/dynamic-lora-sidecar/cloudbuild.yaml
+++ /dev/null
@@ -1,17 +0,0 @@
-# See https://cloud.google.com/cloud-build/docs/build-config
-timeout: 3000s
-
-steps:
-  - name: gcr.io/k8s-testimages/gcb-docker-gcloud:v20220830-45cbff55bc
-    entrypoint: make
-    args:
-      - image-push
-    env:
-      - GIT_TAG=$_GIT_TAG
-      - EXTRA_TAG=$_PULL_BASE_REF
-      - DOCKER_BUILDX_CMD=/buildx-entrypoint
-
-substitutions:
-  _GIT_TAG: '0.0.0'       # Default value for Git tag
-  _PULL_BASE_REF: 'main'   # Default value for branch/tag
-# No options needed!
\ No newline at end of file

From 3271c3f9dc878e8a0a0d666ea9c77f189944fce8 Mon Sep 17 00:00:00 2001
From: Kunjan <kunjanp@google.com>
Date: Thu, 13 Feb 2025 15:08:47 -0800
Subject: [PATCH 04/13] Update site-src/guides/dynamic-lora.md

Co-authored-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
---
 site-src/guides/dynamic-lora.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
index 5356c7e73..0cfd514a3 100644
--- a/site-src/guides/dynamic-lora.md
+++ b/site-src/guides/dynamic-lora.md
@@ -38,7 +38,7 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
                     source: tweet-summary-2=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_2
     ```
 
-2. Configure a canary rollout with traffic split using LLMService. In this example, 10% of traffic to the chatbot model will be sent to v2.
+2. Configure a canary rollout with traffic split using InferenceModel. In this example, 10% of traffic to the chatbot model will be sent to `tweet-summary-3`.
 
 ``` yaml
 model:

From 62adbb1ee57708beed65b0245ad12fcecfa5c723 Mon Sep 17 00:00:00 2001
From: Kunjan <kunjanp@google.com>
Date: Thu, 13 Feb 2025 15:12:32 -0800
Subject: [PATCH 05/13] Update site-src/guides/dynamic-lora.md

Co-authored-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
---
 site-src/guides/dynamic-lora.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
index 0cfd514a3..a842ebd5d 100644
--- a/site-src/guides/dynamic-lora.md
+++ b/site-src/guides/dynamic-lora.md
@@ -1,6 +1,6 @@
 # Getting started with Gateway API Inference Extension with Dynamic lora updates on vllm
 
-The goal of this guide is to get a single InferencePool running with VLLM and demonstrate use of dynamic lora updating ! 
+The goal of this guide is to get a single InferencePool running with vLLM and demonstrate use of dynamic lora updating! 
 
 ### Requirements
  - Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher

From 6f5b9e71fa09b1bedb6203c75b5921aad301e78f Mon Sep 17 00:00:00 2001
From: Kunjan <kunjanp@google.com>
Date: Mon, 10 Feb 2025 18:07:15 -0800
Subject: [PATCH 06/13] Add makefile and cloudbuild file to build and push
 lora-syncer

Signed-off-by: Kunjan <kunjanp@google.com>
---
 .../vllm/deployment-with-syncer.yaml          | 25 ++------
 pkg/manifests/vllm/deployment.yaml            | 10 ----
 site-src/guides/dynamic-lora.md               | 58 ++++++++++++-------
 3 files changed, 42 insertions(+), 51 deletions(-)

diff --git a/pkg/manifests/vllm/deployment-with-syncer.yaml b/pkg/manifests/vllm/deployment-with-syncer.yaml
index 9359123dd..b32d3eb14 100644
--- a/pkg/manifests/vllm/deployment-with-syncer.yaml
+++ b/pkg/manifests/vllm/deployment-with-syncer.yaml
@@ -43,18 +43,8 @@ spec:
           - "--max-cpu-loras"
           - "12"
           - "--lora-modules"
-          - '{"name": "sql-lora-0", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora-1", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora-2", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora-3", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora-4", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
           - '{"name": "tweet-summary-0", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
           - '{"name": "tweet-summary-1", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
-          - '{"name": "tweet-summary-2", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
-          - '{"name": "tweet-summary-3", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
-          - '{"name": "tweet-summary-4", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "tweet-summary", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
           env:
             - name: PORT
               value: "8000"
@@ -143,16 +133,13 @@ data:
         ensureExist:
           models:
           - base-model: meta-llama/Llama-2-7b-hf
-            id: sql-lora-v1
-            source: yard1/llama-2-7b-sql-lora-test
+            id: tweet-summary-0
+            source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
           - base-model: meta-llama/Llama-2-7b-hf
-            id: sql-lora-v3
-            source: yard1/llama-2-7b-sql-lora-test
-          - base-model: meta-llama/Llama-2-7b-hf
-            id: sql-lora-v4
-            source: yard1/llama-2-7b-sql-lora-test
+            id: tweet-summary-1
+            source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
         ensureNotExist:
           models:
           - base-model: meta-llama/Llama-2-7b-hf
-            id: sql-lora-v2
-            source: yard1/llama-2-7b-sql-lora-test
\ No newline at end of file
+            id: tweet-summary-2
+            source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
\ No newline at end of file
diff --git a/pkg/manifests/vllm/deployment.yaml b/pkg/manifests/vllm/deployment.yaml
index 8ea95365b..1d115f4d4 100644
--- a/pkg/manifests/vllm/deployment.yaml
+++ b/pkg/manifests/vllm/deployment.yaml
@@ -43,18 +43,8 @@ spec:
           - "--max-cpu-loras"
           - "12"
           - "--lora-modules"
-          - '{"name": "sql-lora-0", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora-1", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora-2", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora-3", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora-4", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
           - '{"name": "tweet-summary-0", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
           - '{"name": "tweet-summary-1", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
-          - '{"name": "tweet-summary-2", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
-          - '{"name": "tweet-summary-3", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
-          - '{"name": "tweet-summary-4", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
-          - '{"name": "sql-lora", "path": "yard1/llama-2-7b-sql-lora-test", "base_model_name": "llama-2"}'
-          - '{"name": "tweet-summary", "path": "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm", "base_model_name": "llama-2"}'
           env:
             - name: PORT
               value: "8000"
diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
index a842ebd5d..a4f8ba0b9 100644
--- a/site-src/guides/dynamic-lora.md
+++ b/site-src/guides/dynamic-lora.md
@@ -29,33 +29,40 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
         name: dynamic-lora-config
         data:
         configmap.yaml: |
-            vLLMLoRAConfig:
-            ensureExist:   
-                models:
-                - id: tweet-summary-v1
-                    source: tweet-summary-1=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_1
-                - id: tweet-summary-v2
-                    source: tweet-summary-2=/adapters/vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm_2
+             vLLMLoRAConfig:
+                name: sql-loras-llama
+                port: 8000
+                ensureExist:
+                    models:
+                    - base-model: meta-llama/Llama-2-7b-hf
+                      id: tweet-summary-0
+                      source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
+                    - base-model: meta-llama/Llama-2-7b-hf
+                      id: tweet-summary-1
+                      source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
     ```
 
-2. Configure a canary rollout with traffic split using InferenceModel. In this example, 10% of traffic to the chatbot model will be sent to `tweet-summary-3`.
+2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
 
 ``` yaml
 model:
-    name: chatbot
+    name: tweet-summary
     targetModels:
-    targetModelName: chatbot-v1
-            weight: 90
-    targetModelName: chatbot-v2
+    targetModelName: tweet-summary-0
             weight: 10
+    targetModelName: tweet-summary-1
+            weight: 40
+    targetModelName: tweet-summary-2
+            weight: 40
+    
 ```
             
 3. Finish rollout by setting the traffic to the new version 100%.
 ```yaml
 model:
-    name: chatbot
+    name: tweet-summary
     targetModels:
-    targetModelName: chatbot-v2
+    targetModelName: tweet-summary-2
             weight: 100
 ```
     
@@ -68,12 +75,19 @@ model:
     data:
     configmap.yaml: |
             vLLMLoRAConfig:
-            ensureExist:
-            models:
-            - id: chatbot-v2
-                source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
-            ensureNotExist: # Explicitly unregisters the adapter from  model servers
-            models:
-            - id: chatbot-v1
-                source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
+                name: sql-loras-llama
+                port: 8000
+                ensureExist:
+                    models:
+                    - base-model: meta-llama/Llama-2-7b-hf
+                      id: tweet-summary-2
+                      source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
+                ensureNotExist:
+                    models:
+                    - base-model: meta-llama/Llama-2-7b-hf
+                      id: tweet-summary-1
+                      source: gs://[HUGGING FACE PATH]
+                    - base-model: meta-llama/Llama-2-7b-hf
+                      id: tweet-summary-0
+                      source: gs://[HUGGING FACE PATH]
 ```

From 78b9bfecf2939430ffd70367f2acd979e0cdd5b4 Mon Sep 17 00:00:00 2001
From: Daneyon Hansen <daneyon.hansen@solo.io>
Date: Thu, 13 Feb 2025 18:20:20 -0500
Subject: [PATCH 07/13] Adds image-load and kind-load Make targets (#288)

Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>
---
 Makefile | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index f2198844c..f6edb6300 100644
--- a/Makefile
+++ b/Makefile
@@ -167,6 +167,14 @@ image-build: ## Build the EPP image using Docker Buildx.
 image-push: PUSH=--push ## Build the EPP image and push it to $IMAGE_REPO.
 image-push: image-build
 
+.PHONY: image-load
+image-load: LOAD=--load ## Build the EPP image and load it in the local Docker registry.
+image-load: image-build
+
+.PHONY: image-kind
+image-kind: image-build ## Build the EPP image and load it to kind cluster $KIND_CLUSTER ("kind" by default).
+	kind load docker-image $(IMAGE_TAG) --name $(KIND_CLUSTER)
+
 ##@ Lora Syncer
 
 .PHONY: syncer-image-local-build

From 9c367f9bc75587f1c829b091f9297bb8417093a7 Mon Sep 17 00:00:00 2001
From: Kunjan <kunjanp@google.com>
Date: Mon, 10 Feb 2025 18:07:15 -0800
Subject: [PATCH 08/13] Add makefile and cloudbuild file to build and push
 lora-syncer

Signed-off-by: Kunjan <kunjanp@google.com>
---
 Makefile | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/Makefile b/Makefile
index f6edb6300..312be9e7c 100644
--- a/Makefile
+++ b/Makefile
@@ -167,6 +167,31 @@ image-build: ## Build the EPP image using Docker Buildx.
 image-push: PUSH=--push ## Build the EPP image and push it to $IMAGE_REPO.
 image-push: image-build
 
+##@ Lora Syncer
+
+.PHONY: syncer-image-local-build
+syncer-image-local-build:
+	BUILDER=$(shell $(DOCKER_BUILDX_CMD) create --use)
+	$(MAKE) image-build PUSH=$(PUSH)
+	$(DOCKER_BUILDX_CMD) rm $$BUILDER
+
+.PHONY: syncer-image-local-push
+syncer-image-local-push: PUSH=--push
+syncer-image-local-push: syncer-image-local-build
+
+.PHONY: syncer-image-build
+syncer-image-build:
+	$ cd $(CURDIR)/tools/dynamic-lora-sidecar && $(IMAGE_BUILD_CMD) -t $(SYNCER_IMAGE_TAG) \
+		--platform=$(PLATFORMS) \
+		--build-arg BASE_IMAGE=$(BASE_IMAGE) \
+		--build-arg BUILDER_IMAGE=$(BUILDER_IMAGE) \
+		$(PUSH) \
+		$(IMAGE_BUILD_EXTRA_OPTS) ./
+
+.PHONY: syncer-image-push
+syncer-image-push: PUSH=--push
+syncer-image-push: syncer-image-build
+
 .PHONY: image-load
 image-load: LOAD=--load ## Build the EPP image and load it in the local Docker registry.
 image-load: image-build

From 5b31a4cd4d98f38431a805c1204c14ac1e4991f4 Mon Sep 17 00:00:00 2001
From: Kunjan <kunjanp@google.com>
Date: Thu, 13 Feb 2025 16:00:28 -0800
Subject: [PATCH 09/13] Add build targets for lora syncer

Signed-off-by: Kunjan <kunjanp@google.com>
---
 Makefile                            | 38 ++-----------------
 site-src/guides/dynamic-lora.md     |  5 ++-
 tools/dynamic-lora-sidecar/Makefile | 59 -----------------------------
 3 files changed, 8 insertions(+), 94 deletions(-)
 delete mode 100644 tools/dynamic-lora-sidecar/Makefile

diff --git a/Makefile b/Makefile
index 312be9e7c..348bdd1f5 100644
--- a/Makefile
+++ b/Makefile
@@ -26,6 +26,7 @@ PLATFORMS ?= linux/amd64
 DOCKER_BUILDX_CMD ?= docker buildx
 IMAGE_BUILD_CMD ?= $(DOCKER_BUILDX_CMD) build
 IMAGE_BUILD_EXTRA_OPTS ?=
+SYNCER_IMAGE_BUILD_EXTRA_OPTS ?=
 IMAGE_REGISTRY ?= us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension
 IMAGE_NAME := epp
 IMAGE_REPO ?= $(IMAGE_REGISTRY)/$(IMAGE_NAME)
@@ -43,9 +44,11 @@ endif
 
 ifdef EXTRA_TAG
 IMAGE_EXTRA_TAG ?= $(IMAGE_REPO):$(EXTRA_TAG)
+SYNCER_IMAGE_EXTRA_TAG ?= $(SYNCER_IMAGE_REPO):$(EXTRA_TAG)
 endif
 ifdef IMAGE_EXTRA_TAG
 IMAGE_BUILD_EXTRA_OPTS += -t $(IMAGE_EXTRA_TAG)
+SYNCER_IMAGE_BUILD_EXTRA_OPTS += -t $(SYNCER_IMAGE_EXTRA_TAG)
 endif
 
 # The name of the kind cluster to use for the "kind-load" target.
@@ -167,31 +170,6 @@ image-build: ## Build the EPP image using Docker Buildx.
 image-push: PUSH=--push ## Build the EPP image and push it to $IMAGE_REPO.
 image-push: image-build
 
-##@ Lora Syncer
-
-.PHONY: syncer-image-local-build
-syncer-image-local-build:
-	BUILDER=$(shell $(DOCKER_BUILDX_CMD) create --use)
-	$(MAKE) image-build PUSH=$(PUSH)
-	$(DOCKER_BUILDX_CMD) rm $$BUILDER
-
-.PHONY: syncer-image-local-push
-syncer-image-local-push: PUSH=--push
-syncer-image-local-push: syncer-image-local-build
-
-.PHONY: syncer-image-build
-syncer-image-build:
-	$ cd $(CURDIR)/tools/dynamic-lora-sidecar && $(IMAGE_BUILD_CMD) -t $(SYNCER_IMAGE_TAG) \
-		--platform=$(PLATFORMS) \
-		--build-arg BASE_IMAGE=$(BASE_IMAGE) \
-		--build-arg BUILDER_IMAGE=$(BUILDER_IMAGE) \
-		$(PUSH) \
-		$(IMAGE_BUILD_EXTRA_OPTS) ./
-
-.PHONY: syncer-image-push
-syncer-image-push: PUSH=--push
-syncer-image-push: syncer-image-build
-
 .PHONY: image-load
 image-load: LOAD=--load ## Build the EPP image and load it in the local Docker registry.
 image-load: image-build
@@ -219,20 +197,12 @@ syncer-image-build:
 		--build-arg BASE_IMAGE=$(BASE_IMAGE) \
 		--build-arg BUILDER_IMAGE=$(BUILDER_IMAGE) \
 		$(PUSH) \
-		$(IMAGE_BUILD_EXTRA_OPTS) ./
+		$(SYNCER_IMAGE_BUILD_EXTRA_OPTS) ./
 
 .PHONY: syncer-image-push
 syncer-image-push: PUSH=--push
 syncer-image-push: syncer-image-build
 
-.PHONY: image-load
-image-load: LOAD=--load ## Build the EPP image and load it in the local Docker registry.
-image-load: image-build
-
-.PHONY: image-kind
-image-kind: image-build ## Build the EPP image and load it to kind cluster $KIND_CLUSTER ("kind" by default).
-	kind load docker-image $(IMAGE_TAG) --name $(KIND_CLUSTER)
-
 ##@ Docs
 
 .PHONY: build-docs
diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
index a4f8ba0b9..948c2d365 100644
--- a/site-src/guides/dynamic-lora.md
+++ b/site-src/guides/dynamic-lora.md
@@ -40,6 +40,9 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
                     - base-model: meta-llama/Llama-2-7b-hf
                       id: tweet-summary-1
                       source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
+                    - base-model: meta-llama/Llama-2-7b-hf
+                      id: tweet-summary-2
+                      source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
     ```
 
 2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
@@ -49,7 +52,7 @@ model:
     name: tweet-summary
     targetModels:
     targetModelName: tweet-summary-0
-            weight: 10
+            weight: 20
     targetModelName: tweet-summary-1
             weight: 40
     targetModelName: tweet-summary-2
diff --git a/tools/dynamic-lora-sidecar/Makefile b/tools/dynamic-lora-sidecar/Makefile
deleted file mode 100644
index 93f7672d2..000000000
--- a/tools/dynamic-lora-sidecar/Makefile
+++ /dev/null
@@ -1,59 +0,0 @@
-IMAGE_NAME := lora-syncer
-IMAGE_REGISTRY ?= us-central1-docker.pkg.dev/k8s-staging-images/llm-instance-gateway
-IMAGE_REPO ?= $(IMAGE_REGISTRY)/$(IMAGE_NAME)
-
-GIT_TAG ?= $(shell git describe --tags --dirty --always)
-EXTRA_TAG ?= $(if $(_PULL_BASE_REF),$(_PULL_BASE_REF),main)
-IMAGE_TAG ?= $(IMAGE_REPO):$(GIT_TAG)
-EXTRA_IMAGE_TAG ?= $(IMAGE_REPO):$(EXTRA_TAG)
-
-
-PLATFORMS ?= linux/amd64
-
-
-DOCKER_BUILDX_CMD ?= docker buildx
-IMAGE_BUILD_CMD ?= $(DOCKER_BUILDX_CMD) build
-IMAGE_BUILD_EXTRA_OPTS ?=
-
-# --- Targets ---
-.PHONY: image-local-build
-image-local-build:
-	BUILDER=$(shell $(DOCKER_BUILDX_CMD) create --use)
-	$(MAKE) image-build PUSH=$(PUSH)
-	$(DOCKER_BUILDX_CMD) rm $$BUILDER
-
-.PHONY: image-local-push
-image-local-push: PUSH=--push
-image-local-push: image-local-build
-
-.PHONY: image-build
-image-build:
-	$(IMAGE_BUILD_CMD) -t $(IMAGE_TAG) \
-		--platform=$(PLATFORMS) \
-		--build-arg BASE_IMAGE=$(BASE_IMAGE) \
-		--build-arg BUILDER_IMAGE=$(BUILDER_IMAGE) \
-		$(PUSH) \
-		$(IMAGE_BUILD_EXTRA_OPTS) ./
-
-.PHONY: image-push
-image-push: PUSH=--push
-image-push: image-build
-
-.PHONY: run
-run:
-	docker run -v $(CURDIR)/config:/config -u appuser $(IMAGE_TAG) # Use the user name
-
-.PHONY: clean
-clean:
-	docker rmi $(IMAGE_TAG) $(EXTRA_IMAGE_TAG) 2>/dev/null || true
-
-.PHONY: clean-dangling
-clean-dangling:
-	docker rmi $(docker images -f "dangling=true" -q) 2>/dev/null || true
-
-.PHONY: test
-test:
-	python -m unittest discover
-
-.PHONY: all
-all: test image-build
\ No newline at end of file

From 2846d6a3b24c3245d025bebe2a25170cd4e89ab4 Mon Sep 17 00:00:00 2001
From: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
Date: Fri, 14 Feb 2025 13:06:58 -0800
Subject: [PATCH 10/13] Apply suggestions from code review

---
 Makefile                                       | 4 ++--
 pkg/manifests/vllm/deployment-with-syncer.yaml | 2 +-
 site-src/guides/dynamic-lora.md                | 1 -
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 348bdd1f5..1d8fc531c 100644
--- a/Makefile
+++ b/Makefile
@@ -33,8 +33,8 @@ IMAGE_REPO ?= $(IMAGE_REGISTRY)/$(IMAGE_NAME)
 IMAGE_TAG ?= $(IMAGE_REPO):$(GIT_TAG)
 
 SYNCER_IMAGE_NAME := lora-syncer
-SYNCER_IMAGE_REPO ?= $(IMAGE_REGISTRY)/$(IMAGE_NAME)
-SYNCER_IMAGE_TAG ?= $(IMAGE_REPO):$(GIT_TAG)
+SYNCER_IMAGE_REPO ?= $(IMAGE_REGISTRY)/$(SYNCER_IMAGE_NAME)
+SYNCER_IMAGE_TAG ?= $(SYNCER_IMAGE_REPO):$(GIT_TAG)
 
 BASE_IMAGE ?= gcr.io/distroless/base-debian10
 BUILDER_IMAGE ?= golang:1.23-alpine
diff --git a/pkg/manifests/vllm/deployment-with-syncer.yaml b/pkg/manifests/vllm/deployment-with-syncer.yaml
index b32d3eb14..d6110f4b1 100644
--- a/pkg/manifests/vllm/deployment-with-syncer.yaml
+++ b/pkg/manifests/vllm/deployment-with-syncer.yaml
@@ -95,7 +95,7 @@ spec:
         - name: lora-adapter-syncer
           tty: true
           stdin: true 
-          image: <SIDECAR_IMAGE> #Replace image
+          image: us-central1-docker.pkg.dev/ahg-gke-dev/jobset2/lora-syncer:6dc97be
           restartPolicy: Always
           imagePullPolicy: Always
           env: 
diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
index 948c2d365..e2396d69b 100644
--- a/site-src/guides/dynamic-lora.md
+++ b/site-src/guides/dynamic-lora.md
@@ -22,7 +22,6 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
 1. Update lora configmap
 
 ``` yaml
-
         apiVersion: v1
         kind: ConfigMap
         metadata:

From 6bbbacb9585b2d52c38276f76b8f17b57f68f947 Mon Sep 17 00:00:00 2001
From: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
Date: Fri, 14 Feb 2025 13:09:10 -0800
Subject: [PATCH 11/13] Apply suggestions from code review

---
 site-src/guides/dynamic-lora.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
index e2396d69b..f10bb47f8 100644
--- a/site-src/guides/dynamic-lora.md
+++ b/site-src/guides/dynamic-lora.md
@@ -12,16 +12,16 @@ The goal of this guide is to get a single InferencePool running with vLLM and de
 ### Steps
 
 1. **Deploy Sample VLLM Model Server with dynamic lora update enabled and dynamic lora syncer sidecar **
-    [Deploy sample vllm deployment with Dynamic lora adapter enabled and Lora syncer sidecar and configmap](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/manifests/vllm/dynamic-lora-sidecar/deployment.yaml)
+    [Redeploy the vLLM deployment with Dynamic lora adapter enabled and Lora syncer sidecar and configmap](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/manifests/vllm/dynamic-lora-sidecar/deployment.yaml)
 
 Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/site-src/guides/index.md)
 
 
 ### Safely rollout v2 adapter
     
-1. Update lora configmap
+1. Update the LoRA syncer ConfigMap to make the new adapter version available on the model servers.
 
-``` yaml
+```yaml
         apiVersion: v1
         kind: ConfigMap
         metadata:
@@ -46,7 +46,7 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
 
 2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
 
-``` yaml
+```yaml
 model:
     name: tweet-summary
     targetModels:

From ebfaa6ef2c6d500ffe18300eb07901cb2efc3049 Mon Sep 17 00:00:00 2001
From: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
Date: Fri, 14 Feb 2025 13:10:25 -0800
Subject: [PATCH 12/13] Apply suggestions from code review

---
 site-src/guides/dynamic-lora.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
index f10bb47f8..0f9c31893 100644
--- a/site-src/guides/dynamic-lora.md
+++ b/site-src/guides/dynamic-lora.md
@@ -42,7 +42,6 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
                     - base-model: meta-llama/Llama-2-7b-hf
                       id: tweet-summary-2
                       source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
-    ```
 
 2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
 

From 277125c4b72f531f266f6350f030db7650192c38 Mon Sep 17 00:00:00 2001
From: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
Date: Fri, 14 Feb 2025 13:11:38 -0800
Subject: [PATCH 13/13] Apply suggestions from code review

---
 site-src/guides/dynamic-lora.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/site-src/guides/dynamic-lora.md b/site-src/guides/dynamic-lora.md
index 0f9c31893..ef3c2b0f8 100644
--- a/site-src/guides/dynamic-lora.md
+++ b/site-src/guides/dynamic-lora.md
@@ -42,7 +42,6 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
                     - base-model: meta-llama/Llama-2-7b-hf
                       id: tweet-summary-2
                       source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
-
 2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
 
 ```yaml