-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Wider support for gelu. Remove unused activation. Use same torch layer for silu and swish #3302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wider support for gelu. Remove unused activation. Use same torch layer for silu and swish #3302
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should create a simple activations.py
file as is done here: https://github.com/huggingface/transformers/blob/main/src/transformers/activations.py and import from there
Probably @patrickvonplaten but that would be more of a design choice, whereas this is a path to fix a current bug. |
Specifically, for a UNet1D model, it only works (forward+backward pass) if an activation is specified, but no failure is raised upon module creation, only at runtime. Also, there is an inconsistency between the timestep embedding module and the rest, where the timestep embedding does not accept As you say, probably it would be better for the future to have all activations grouped on a single place. That would help with maintainability and avoid code duplications. |
cc @williamberman let's move all those activation functions into their own file |
@williamberman could you try to help move all the activation functions to their own filesL |
hey @hypnopump sorry for the delay! I put up a PR here just since this branch is a bit behind tip and has some failing ci. Happy to merge this PR as well if you want to rebase and cherry pick the commit from my branch or what not. Up to you! :) |
makes sense. im closing this then as functionality is implemented in your newer PR (left a review btw :)). |
Adds support for GeLU in modules where only
silu/swish
andmish
were supported. Uses the same layer forsilu
andswish
instead of alambda
definition forswish
and the layer forsilu