Skip to content

Conversation

nirrozenbaum
Copy link
Contributor

@nirrozenbaum nirrozenbaum commented Jun 10, 2025

This PR lays the ground for having an extensible epp main with minimal changes.
the code in this PR moves the code in current main.go to runner package under epp/cmd.
ALMOST NO CODE CHANGES were made except for:

  • making the run function public (now it's called Run).
  • added two simple functions to allow configuring the extension points from any main.go, e.g., from llm-d main.go.
    This can be done via the Runner.WithSchedulerConfig(...) function or Runner.WithRequestControlConfig(...) which is used to configure the PostResponse plugins.

That's it. NO OTHER CHANGES.

main.go now looks like the following:

func main() {
	if err := runner.NewRunner().Run(); err != nil {
		os.Exit(1)
	}
}

** In this PR there is no intention to make improvements to main package, logging, or any other structuring issue than the one that is mentioned in the intro. This is only a first step out of a series of PRs that will be added to GIE to improve the way we initialize and run EPP in an extensible manner.

cc: @ahg-g @kfswain @elevran

Copy link

netlify bot commented Jun 10, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 658185a
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/684980614cac6800080bd206
😎 Deploy Preview https://deploy-preview-956--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 10, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 10, 2025
@nirrozenbaum
Copy link
Contributor Author

@JeffLuoo are you able to assist?
I'm remember you added the CommitSHA part. this PR is changing cmd/main, and fails to build both locally and in ci/cd.
seems to be related to CommitSHA. do you know what am I missing?

[builder 11/11] RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=2162cc40dd02ee85876c675ed9ab003078ed66c0 -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=" -o /epp
#17 0.522 main.go:22:2: no required module provides package sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner; to add it:
#17 0.522 	go get sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner
#17 ERROR: process "/bin/sh -c go build -ldflags=\"-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=${BUILD_REF}\" -o /epp" did not complete successfully: exit code: 1
------
 > [builder 11/11] RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=2162cc40dd02ee85876c675ed9ab003078ed66c0 -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=" -o /epp:
0.522 main.go:22:2: no required module provides package sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner; to add it:
0.522 	go get sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner
------
Dockerfile:26
--------------------
  24 |     COPY api ./api
  25 |     WORKDIR /src/cmd
  26 | >>> RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=${BUILD_REF}" -o /epp
  27 |     
  28 |     ## Multistage deploy
--------------------
ERROR: failed to solve: process "/bin/sh -c go build -ldflags=\"-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=${BUILD_REF}\" -o /epp" did not complete successfully: exit code: 1
make: *** [Makefile:184: image-build] Error 1
+ EXIT_VALUE=2
+ set +o xtrace
Cleaning up after docker in docker.

@JeffLuoo
Copy link
Contributor

@JeffLuoo are you able to assist? I'm remember you added the CommitSHA part. this PR is changing cmd/main, and fails to build both locally and in ci/cd. seems to be related to CommitSHA. do you know what am I missing?

[builder 11/11] RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=2162cc40dd02ee85876c675ed9ab003078ed66c0 -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=" -o /epp
#17 0.522 main.go:22:2: no required module provides package sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner; to add it:
#17 0.522 	go get sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner
#17 ERROR: process "/bin/sh -c go build -ldflags=\"-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=${BUILD_REF}\" -o /epp" did not complete successfully: exit code: 1
------
 > [builder 11/11] RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=2162cc40dd02ee85876c675ed9ab003078ed66c0 -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=" -o /epp:
0.522 main.go:22:2: no required module provides package sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner; to add it:
0.522 	go get sigs.k8s.io/gateway-api-inference-extension/cmd/epp/runner
------
Dockerfile:26
--------------------
  24 |     COPY api ./api
  25 |     WORKDIR /src/cmd
  26 | >>> RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=${BUILD_REF}" -o /epp
  27 |     
  28 |     ## Multistage deploy
--------------------
ERROR: failed to solve: process "/bin/sh -c go build -ldflags=\"-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=${BUILD_REF}\" -o /epp" did not complete successfully: exit code: 1
make: *** [Makefile:184: image-build] Error 1
+ EXIT_VALUE=2
+ set +o xtrace
Cleaning up after docker in docker.

COMMIT_SHA has default value unknown from https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/Dockerfile#L11. Can you try change the line:

RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics.BuildRef=${BUILD_REF}" -o /epp

to just:

RUN go build -o /epp

to see if it works?

@ahg-g
Copy link
Contributor

ahg-g commented Jun 10, 2025

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Jun 11, 2025

fyi, https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/951/files will cause a conflict

I put a hold on 951 since the parameters should move to the config api anyway

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2025
Signed-off-by: Nir Rozenbaum <[email protected]>
@nirrozenbaum
Copy link
Contributor Author

@JeffLuoo ok I think I caught the problem. it was a required line change in the Dockerfile, not related to CommitSHA.
you can see in the files tab the change in the Dockerfile.
your question did help me find it, thanks :)

@kfswain
Copy link
Collaborator

kfswain commented Jun 11, 2025

Have we validated this manually? Hitting CI issues with the build makes me think we need some stronger validation

Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmuelk this will cause a great conflict with your PR, need to figure out which one to get in first, probably this one since it is "simpler"

return r
}

func (r *Runner) Run() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an exact copy of the logic that existed in main.go? if not, can you point out the diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exact copy with the only difference of initialization of the scheduler.

@nirrozenbaum
Copy link
Contributor Author

@shmuelk this will cause a great conflict with your PR, need to figure out which one to get in first, probably this one since it is "simpler"

right. I took it into consideration and talked with Shmuel. obviously things may change here, but this simple change will allow us to get rid of all main related code duplication in llm-d.

working iteratively, when his big PR merges we will do the adaptations.

@ahg-g
Copy link
Contributor

ahg-g commented Jun 12, 2025

Are you able to run the e2e test?

@ahg-g
Copy link
Contributor

ahg-g commented Jun 12, 2025

/approve
/lgtm

I will get approve this to unblock the config api PR, it will trigger GKE's internal regression test, but please also try to run the e2e test.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2025
@k8s-ci-robot k8s-ci-robot merged commit 1b5fb26 into kubernetes-sigs:main Jun 12, 2025
8 checks passed
@nirrozenbaum
Copy link
Contributor Author

@ahg-g sure, e2e-test pass. missed your question.
image

@nirrozenbaum nirrozenbaum deleted the runner branch June 12, 2025 05:10
@nirrozenbaum
Copy link
Contributor Author

Have we validated this manually? Hitting CI issues with the build makes me think we need some stronger validation

@kfswain sure 👍🏻

shmuelk pushed a commit to shmuelk/gateway-api-inference-extension that referenced this pull request Jun 15, 2025
* moved main code to runner package under epp/cmd

Signed-off-by: Nir Rozenbaum <[email protected]>

* added the ability to configure postResponse plugins in requestcontrol layer

Signed-off-by: Nir Rozenbaum <[email protected]>

* updated dockerfile

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants