[`OPT`] Fix attention scaling #38290

vasqu · 2025-05-22T08:59:50Z

Scaling has been applied twice to OPT. This fixes it by passing 1.0 explicitly to avoid sdpa and co. to create a default scaling.

Fixes #38277

zucchini-nlp

Thanks for handling it! I think we should remove query_states = self.q_proj(hidden_states) * self.scaling to be consistent. Should be equivalent imo

vasqu · 2025-05-22T09:15:25Z

Thought it was equivalent too but sadly it's not. Learned the hard lesson with whisper 😢

HuggingFaceDocBuilderDev · 2025-05-22T09:26:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Wow, okee, that's sad. Can we add a comment explaining why is so? Future us won't remember and refactor it out 😆

vasqu · 2025-05-22T09:41:38Z

Haha, yea good point. I'll add a comment ^^

vasqu · 2025-05-22T12:25:34Z

Added a comment

ArthurZucker

Thanks!

DarkLight1337 · 2025-05-26T15:37:10Z

Can this also be included in the upcoming patch? It's required to pass vLLM tests

* fix opt attention scaling * add comment to why we do this

ArthurZucker · 2025-05-28T08:39:04Z

yes for sure

fix opt attention scaling

abc06b4

vasqu marked this pull request as ready for review May 22, 2025 09:00

github-actions bot requested a review from ArthurZucker May 22, 2025 09:00

vasqu requested a review from zucchini-nlp May 22, 2025 09:05

Merge branch 'main' into vas-fix-opt

dc51611

zucchini-nlp reviewed May 22, 2025

View reviewed changes

vasqu mentioned this pull request May 22, 2025

🔴[Attention] Attention refactor for Whisper-based models #38235

Merged

zucchini-nlp approved these changes May 22, 2025

View reviewed changes

add comment to why we do this

d2c7002

ArthurZucker approved these changes May 26, 2025

View reviewed changes

vasqu merged commit d03a3ca into main May 26, 2025
17 checks passed

vasqu deleted the vas-fix-opt branch May 26, 2025 09:02

vasqu added the for patch Tag issues / labels that should be included in the next patch label May 27, 2025

ArthurZucker pushed a commit that referenced this pull request May 27, 2025

[OPT] Fix attention scaling (#38290)

3013642

* fix opt attention scaling * add comment to why we do this

ArthurZucker pushed a commit that referenced this pull request May 27, 2025

[OPT] Fix attention scaling (#38290)

98b1be3

* fix opt attention scaling * add comment to why we do this

ArthurZucker pushed a commit that referenced this pull request May 27, 2025

[OPT] Fix attention scaling (#38290)

f5d15e6

* fix opt attention scaling * add comment to why we do this

ArthurZucker pushed a commit that referenced this pull request May 28, 2025

[OPT] Fix attention scaling (#38290)

66d32ab

* fix opt attention scaling * add comment to why we do this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`OPT`] Fix attention scaling #38290

[`OPT`] Fix attention scaling #38290

Uh oh!

vasqu commented May 22, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

vasqu commented May 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

vasqu commented May 22, 2025

Uh oh!

vasqu commented May 22, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

DarkLight1337 commented May 26, 2025

Uh oh!

ArthurZucker commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[OPT] Fix attention scaling #38290

[OPT] Fix attention scaling #38290

Uh oh!

Conversation

vasqu commented May 22, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented May 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented May 22, 2025

Uh oh!

vasqu commented May 22, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented May 26, 2025

Uh oh!

ArthurZucker commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[`OPT`] Fix attention scaling #38290

[`OPT`] Fix attention scaling #38290