add shared experts for upcoming Granite 4.0 language models #35894

mayank31398 · 2025-01-26T12:19:50Z

This PR adds support for shared experts in GraniteMoE model class for upcoming Granite 4.0 language models.
@ArthurZucker

mayank31398 · 2025-01-28T10:27:58Z

@ArthurZucker can you merge this?
all checks have passed

ArthurZucker

Hey! As always for us this will be a new model, with modular it should be super easy to add however! https://huggingface.co/docs/transformers/en/modular_transformers 🤗

shawntan · 2025-02-05T22:27:20Z

We're adding an additional feature (shared experts) that doesn't break past checkpoints, and is an extension of our own model class. Would every extension entail a new model class?

ArthurZucker · 2025-02-06T10:31:24Z

Yes 🤗 I am sorry but that is the way we have been handling every single model so far!

Signed-off-by: Shawn Tan <[email protected]>

shawntan · 2025-02-07T13:18:16Z

Not sure how to get the tests to pass , some are not due to the changes I've made.

Signed-off-by: Sukriti-Sharma4 <[email protected]>

mayank31398 · 2025-02-13T00:41:56Z

@ArthurZucker the PR is ready, please review.
The failing tests seem unrelated

ArthurZucker

Super clean! Super nice!

ArthurZucker · 2025-02-13T16:23:56Z

src/transformers/models/granitemoeshared/modular_granitemoeshared.py

+
+
+class GraniteMoeSharedForCausalLM(GraniteMoeForCausalLM):
+    _tied_weights_keys = ["lm_head.weight"]


if the mlp is shared, should it appear here?

no, it shouldnt.
shared means its a shared in sense of experts (not across layers)

mayank31398 · 2025-02-13T16:43:44Z

Thanks for approving, please merge as soon as possible :)

Signed-off-by: Sukriti-Sharma4 <[email protected]>

Ssukriti · 2025-02-14T15:50:51Z

I have updated with the main branch , made corresponding changes and all checks have passed :)

* Modular GraniteMoE with shared Experts. Signed-off-by: Shawn Tan <[email protected]> * Modified * Import order. * Modified for style * Fix space. * Test * Remove extra granitemoe file. * New converted file and tests * Modified __init__ files. * Formatting. * Dummy PT objects * register granitemoe shared model Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix linting of a file Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix import in modeling file Signed-off-by: Sukriti-Sharma4 <[email protected]> * update generated modeling file Signed-off-by: Sukriti-Sharma4 <[email protected]> * add documentation Signed-off-by: Sukriti-Sharma4 <[email protected]> * update docstrings Signed-off-by: Sukriti-Sharma4 <[email protected]> * update generated modeling file Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix docstrings in config class Signed-off-by: Sukriti-Sharma4 <[email protected]> * merge main Signed-off-by: Sukriti-Sharma4 <[email protected]> --------- Signed-off-by: Shawn Tan <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Shawn Tan <[email protected]> Co-authored-by: Shawn Tan <[email protected]> Co-authored-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Sukriti Sharma <[email protected]>

…ace#35894) * Modular GraniteMoE with shared Experts. Signed-off-by: Shawn Tan <[email protected]> * Modified * Import order. * Modified for style * Fix space. * Test * Remove extra granitemoe file. * New converted file and tests * Modified __init__ files. * Formatting. * Dummy PT objects * register granitemoe shared model Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix linting of a file Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix import in modeling file Signed-off-by: Sukriti-Sharma4 <[email protected]> * update generated modeling file Signed-off-by: Sukriti-Sharma4 <[email protected]> * add documentation Signed-off-by: Sukriti-Sharma4 <[email protected]> * update docstrings Signed-off-by: Sukriti-Sharma4 <[email protected]> * update generated modeling file Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix docstrings in config class Signed-off-by: Sukriti-Sharma4 <[email protected]> * merge main Signed-off-by: Sukriti-Sharma4 <[email protected]> --------- Signed-off-by: Shawn Tan <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Shawn Tan <[email protected]> Co-authored-by: Shawn Tan <[email protected]> Co-authored-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Sukriti Sharma <[email protected]>

EwoutH · 2025-05-04T08:20:33Z

Is the public release of Granite-4.0-Tiny-Preview in any way relevant to this PR? (like does it warrant any follow-up work, additional validation/testing/CI, etc.)

mayank31398 mentioned this pull request Jan 26, 2025

add shared experts in GraniteMoE conversion code open-lm-engine/lm-engine#108

Merged

ArthurZucker reviewed Jan 29, 2025

View reviewed changes

shawntan force-pushed the shared-experts branch from a0fd52a to 9b2652c Compare February 6, 2025 19:50

shawntan added 7 commits February 6, 2025 21:41

Modular GraniteMoE with shared Experts.

7e84bb6

Signed-off-by: Shawn Tan <[email protected]>

Modified

4961236

Import order.

f6fe3d9

Modified for style

36668f1

Fix space.

0181937

Test

df48452

Remove extra granitemoe file.

d81c969

shawntan force-pushed the shared-experts branch from c09ee2f to d81c969 Compare February 6, 2025 21:41

Merge branch 'main' into shared-experts

2b2da13

shawntan and others added 10 commits February 7, 2025 13:56

New converted file and tests

95d35bb

Modified __init__ files.

e2ece17

Formatting.

662dfc2

Dummy PT objects

6fe42fe

register granitemoe shared model

a2b37b6

Signed-off-by: Sukriti-Sharma4 <[email protected]>

Merge branch 'main' into shared-experts

bca084d

fix linting of a file

f11b115

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fix import in modeling file

822d070

Signed-off-by: Sukriti-Sharma4 <[email protected]>

update generated modeling file

9026f35

Signed-off-by: Sukriti-Sharma4 <[email protected]>

Merge branch 'main' into shared-experts

0032dc7

mayank31398 marked this pull request as draft February 11, 2025 17:17

Ssukriti and others added 3 commits February 11, 2025 16:10

add documentation

03492a4

Signed-off-by: Sukriti-Sharma4 <[email protected]>

update docstrings

4f5cab5

Signed-off-by: Sukriti-Sharma4 <[email protected]>

Merge branch 'main' into shared-experts

fd911fd

Ssukriti added 2 commits February 12, 2025 15:49

update generated modeling file

a50a320

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fix docstrings in config class

101b786

Signed-off-by: Sukriti-Sharma4 <[email protected]>

mayank31398 marked this pull request as ready for review February 13, 2025 00:41

ArthurZucker self-requested a review February 13, 2025 09:55

ArthurZucker approved these changes Feb 13, 2025

View reviewed changes

Ssukriti added 2 commits February 14, 2025 07:25

Merge branch 'main' into shared-experts

9985ea4

merge main

997ef0e

Signed-off-by: Sukriti-Sharma4 <[email protected]>

ArthurZucker merged commit a570e2b into huggingface:main Feb 14, 2025
23 checks passed

mayank31398 deleted the shared-experts branch February 14, 2025 15:58

tjohnson31415 mentioned this pull request Feb 15, 2025

[Model] Add support for GraniteMoeShared models vllm-project/vllm#13313

Merged

EwoutH mentioned this pull request May 4, 2025

Add support for IBM Granite-4.0-Tiny-Preview ollama/ollama#10557

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add shared experts for upcoming Granite 4.0 language models #35894

add shared experts for upcoming Granite 4.0 language models #35894

Uh oh!

mayank31398 commented Jan 26, 2025

Uh oh!

mayank31398 commented Jan 28, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

shawntan commented Feb 5, 2025

Uh oh!

ArthurZucker commented Feb 6, 2025

Uh oh!

shawntan commented Feb 7, 2025

Uh oh!

mayank31398 commented Feb 13, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Feb 13, 2025

Uh oh!

mayank31398 Feb 13, 2025

Uh oh!

mayank31398 commented Feb 13, 2025

Uh oh!

Ssukriti commented Feb 14, 2025

Uh oh!

Uh oh!

EwoutH commented May 4, 2025 •

edited

Loading

Uh oh!

Uh oh!



		class GraniteMoeSharedForCausalLM(GraniteMoeForCausalLM):
		_tied_weights_keys = ["lm_head.weight"]

add shared experts for upcoming Granite 4.0 language models #35894

add shared experts for upcoming Granite 4.0 language models #35894

Uh oh!

Conversation

mayank31398 commented Jan 26, 2025

Uh oh!

mayank31398 commented Jan 28, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

shawntan commented Feb 5, 2025

Uh oh!

ArthurZucker commented Feb 6, 2025

Uh oh!

shawntan commented Feb 7, 2025

Uh oh!

mayank31398 commented Feb 13, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

mayank31398 Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

mayank31398 commented Feb 13, 2025

Uh oh!

Ssukriti commented Feb 14, 2025

Uh oh!

Uh oh!

EwoutH commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

EwoutH commented May 4, 2025 •

edited

Loading