Skip to content

Conversation

mayank31398
Copy link
Contributor

This PR adds support for shared experts in GraniteMoE model class for upcoming Granite 4.0 language models.
@ArthurZucker

@mayank31398
Copy link
Contributor Author

@ArthurZucker can you merge this?
all checks have passed

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! As always for us this will be a new model, with modular it should be super easy to add however! https://huggingface.co/docs/transformers/en/modular_transformers 🤗

@shawntan
Copy link
Contributor

shawntan commented Feb 5, 2025

We're adding an additional feature (shared experts) that doesn't break past checkpoints, and is an extension of our own model class. Would every extension entail a new model class?

@ArthurZucker
Copy link
Collaborator

Yes 🤗 I am sorry but that is the way we have been handling every single model so far!

@shawntan
Copy link
Contributor

shawntan commented Feb 7, 2025

Not sure how to get the tests to pass , some are not due to the changes I've made.

@mayank31398 mayank31398 marked this pull request as draft February 11, 2025 17:17
Signed-off-by: Sukriti-Sharma4 <[email protected]>
Signed-off-by: Sukriti-Sharma4 <[email protected]>
@mayank31398 mayank31398 marked this pull request as ready for review February 13, 2025 00:41
@mayank31398
Copy link
Contributor Author

@ArthurZucker the PR is ready, please review.
The failing tests seem unrelated

@ArthurZucker ArthurZucker self-requested a review February 13, 2025 09:55
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean! Super nice!



class GraniteMoeSharedForCausalLM(GraniteMoeForCausalLM):
_tied_weights_keys = ["lm_head.weight"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the mlp is shared, should it appear here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it shouldnt.
shared means its a shared in sense of experts (not across layers)

@mayank31398
Copy link
Contributor Author

Thanks for approving, please merge as soon as possible :)

@Ssukriti
Copy link
Contributor

I have updated with the main branch , made corresponding changes and all checks have passed :)

@ArthurZucker ArthurZucker merged commit a570e2b into huggingface:main Feb 14, 2025
23 checks passed
@mayank31398 mayank31398 deleted the shared-experts branch February 14, 2025 15:58
ArthurZucker pushed a commit that referenced this pull request Feb 17, 2025
* Modular GraniteMoE with shared Experts.

Signed-off-by: Shawn Tan <[email protected]>

* Modified

* Import order.

* Modified for style

* Fix space.

* Test

* Remove extra granitemoe file.

* New converted file and tests

* Modified __init__ files.

* Formatting.

* Dummy PT objects

* register granitemoe shared model

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix linting of a file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix import in modeling file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* add documentation

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* update docstrings

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix docstrings in config class

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* merge main

Signed-off-by: Sukriti-Sharma4 <[email protected]>

---------

Signed-off-by: Shawn Tan <[email protected]>
Signed-off-by: Sukriti-Sharma4 <[email protected]>
Co-authored-by: Shawn Tan <[email protected]>
Co-authored-by: Shawn Tan <[email protected]>
Co-authored-by: Sukriti-Sharma4 <[email protected]>
Co-authored-by: Sukriti Sharma <[email protected]>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Feb 21, 2025
…ace#35894)

* Modular GraniteMoE with shared Experts.

Signed-off-by: Shawn Tan <[email protected]>

* Modified

* Import order.

* Modified for style

* Fix space.

* Test

* Remove extra granitemoe file.

* New converted file and tests

* Modified __init__ files.

* Formatting.

* Dummy PT objects

* register granitemoe shared model

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix linting of a file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix import in modeling file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* add documentation

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* update docstrings

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix docstrings in config class

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* merge main

Signed-off-by: Sukriti-Sharma4 <[email protected]>

---------

Signed-off-by: Shawn Tan <[email protected]>
Signed-off-by: Sukriti-Sharma4 <[email protected]>
Co-authored-by: Shawn Tan <[email protected]>
Co-authored-by: Shawn Tan <[email protected]>
Co-authored-by: Sukriti-Sharma4 <[email protected]>
Co-authored-by: Sukriti Sharma <[email protected]>
@EwoutH
Copy link

EwoutH commented May 4, 2025

Is the public release of Granite-4.0-Tiny-Preview in any way relevant to this PR? (like does it warrant any follow-up work, additional validation/testing/CI, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants