-
Notifications
You must be signed in to change notification settings - Fork 622
[0.7.3] optimize Qwen2.5 vl vit #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zouyida <[email protected]>
Signed-off-by: zouyida <[email protected]>
Signed-off-by: zouyida <[email protected]>
Signed-off-by: zouyida <[email protected]>
Signed-off-by: zouyida <[email protected]>
Signed-off-by: zouyida <[email protected]>
|
I'm fine with this change, this is for qwen2.5 vl performance improvement. @ganyi1996ppo please double check it. Thanks. |
|
|
||
|
|
||
| RotaryEmbedding.forward_oot = rope_forward_oot | ||
| MRotaryEmbedding.forward = mrope_forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically, we don't want to change anything in vllm except foward_oot. For this kind of change, we should ask vllm to satisfy our requirement. Let's do it in the future. Thanks.
|
|
||
| q, k, v = (rearrange(x, "s b ... -> b s ...").contiguous() | ||
| for x in (q, k, v)) | ||
| q = torch_npu.npu_rotary_mul(q, cos, sin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any custom op here? Qwen2.5 VL, what's the shape of these q, k, cos and sin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function's functionality is as follows:
x1, x2 = torch.chunk(q, 2, -1)
x_new = torch.cat((-x2, x1), dim=-1)
output = cos * x + sin * x_new
I cann't find any custom op that meets my expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks exactly what normal rotary embedding do.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge this PR first and optimize at next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for your advice, I will optimize it soon.
|
This should also be merged to main before v0.8.4.rc2. |
What this PR does / why we need it?
optimize Qwen2 5 vl vit with pta
Does this PR introduce any user-facing change?
no
How was this patch tested?
we've tested on benchmark and it proves to be equal.