-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend #15655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
…because xla doesn't allow partial updates Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
This reverts commit b78b088. Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the contribution!
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work optimizing lora here! Just had some minor notes, please take a look when you find the time. Otherwise we can address them in a separate PR if needs be.
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Akshat Tripathi <[email protected]>
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry for delaying the merge a bit! Let's get this landed today.
No worries! Yep I'm hoping once these tests pass we can merge it in. Would you mind re-enabling auto-merge? |
Signed-off-by: Akshat Tripathi <[email protected]>
…llm-project#15655) Signed-off-by: Akshat Tripathi <[email protected]> Signed-off-by: Chengji Yao <[email protected]> Signed-off-by: xihajun <[email protected]> Signed-off-by: Jorge de Freitas <[email protected]> Signed-off-by: Jorge de Freitas <[email protected]> Co-authored-by: Chengji Yao <[email protected]> Co-authored-by: xihajun <[email protected]> Co-authored-by: Jorge de Freitas <[email protected]> Co-authored-by: Jorge de Freitas <[email protected]> Signed-off-by: amit <[email protected]>
I have a few questions about the data published in this PR:
I also have a few questions about the "Hot Swapping" and "Compare Multi-LoRAs" tabs in this link: https://insights.krai.ai/benchmarking-multi-lora
|
Hi @amanocha thanks for your interest.
As for the questions about the website.
|
Summary
This PR optimises the Multi-LoRA implementation from #14238. This one should be merged in after it.
This includes several kernel optimisations:
And a few general ones:
expand
op a82f3feThings left/RFC
LogitsProcessorWithLoRA
introduces a long (~1.5 second) stall when it's enabled, but not much activity seems to happen on the CPU or TPU during this time. I've disabled this for now.LogitsProcessorWithLoRA
is always created even if there's no LoRA adapter that needs it, is there a reason for this?