Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented May 22, 2025

Scaling has been applied twice to OPT. This fixes it by passing 1.0 explicitly to avoid sdpa and co. to create a default scaling.

Fixes #38277

@vasqu vasqu marked this pull request as ready for review May 22, 2025 09:00
@github-actions github-actions bot requested a review from ArthurZucker May 22, 2025 09:00
@vasqu vasqu requested a review from zucchini-nlp May 22, 2025 09:05
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling it! I think we should remove query_states = self.q_proj(hidden_states) * self.scaling to be consistent. Should be equivalent imo

@vasqu
Copy link
Contributor Author

vasqu commented May 22, 2025

Thought it was equivalent too but sadly it's not. Learned the hard lesson with whisper 😢

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, okee, that's sad. Can we add a comment explaining why is so? Future us won't remember and refactor it out 😆

@vasqu
Copy link
Contributor Author

vasqu commented May 22, 2025

Haha, yea good point. I'll add a comment ^^

@vasqu
Copy link
Contributor Author

vasqu commented May 22, 2025

Added a comment

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@vasqu vasqu merged commit d03a3ca into main May 26, 2025
17 checks passed
@vasqu vasqu deleted the vas-fix-opt branch May 26, 2025 09:02
@DarkLight1337
Copy link

Can this also be included in the upcoming patch? It's required to pass vLLM tests

@vasqu vasqu added the for patch Tag issues / labels that should be included in the next patch label May 27, 2025
ArthurZucker pushed a commit that referenced this pull request May 27, 2025
* fix opt attention scaling

* add comment to why we do this
ArthurZucker pushed a commit that referenced this pull request May 27, 2025
* fix opt attention scaling

* add comment to why we do this
ArthurZucker pushed a commit that referenced this pull request May 27, 2025
* fix opt attention scaling

* add comment to why we do this
ArthurZucker pushed a commit that referenced this pull request May 28, 2025
* fix opt attention scaling

* add comment to why we do this
@ArthurZucker
Copy link
Collaborator

yes for sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

large accuracy drop for opt-125m with V4.52.2

6 participants