Skip to content

Improve Gemma3n model and tests #39764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

manueldeprada
Copy link
Contributor

@manueldeprada manueldeprada commented Jul 29, 2025

Improves the Gemma3n model and tests by:

  • Remove hardcoded number of layers in the activation sparsity init.
  • Better explanation for layer reuse.
  • Enable and update integration tests.
  • Removing unused pan and scan configuration options from ImageProcessor.
  • Skipping some incompatible tests.

@@ -227,7 +227,7 @@ def __init__(
altup_num_inputs: int = 4,
num_kv_shared_layers: int = 15,
laurel_rank: int = 64,
activation_sparsity_pattern: Optional[Union[float, Sequence[float]]] = (0.95,) * 10 + (0.0,) * 25,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having the number of layers hardcoded is no good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, agreed! Also imo having it a single float 0.95 isn;t very intuitive, we can default to None in signature and later if None: pattern = (0.95,) * 10 [....]

Copy link
Contributor Author

@manueldeprada manueldeprada Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I thought the same. The problem with doing that is that it changes the behavior of None, which people might rely on in the wild:

if activation_sparsity_pattern is None:
activation_sparsity_pattern = [0.0] * num_hidden_layers

Maybe default to -1 or empty tuple?? Or you think it is safe to change None behaviour?

@@ -659,7 +658,6 @@ def test_automodelforcausallm(self):
self.assertIsInstance(for_causal_lm, Gemma3nForCausalLM)


@unittest.skip("Skipped for now!")
Copy link
Contributor Author

@manueldeprada manueldeprada Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests were copied from gemma3 and were skipped. I updated and enabled them.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@huggingface huggingface deleted a comment from github-actions bot Jul 30, 2025
@huggingface huggingface deleted a comment from github-actions bot Jul 30, 2025
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/gemma3', 'models/gemma3n']
quantizations: [] ...

@manueldeprada
Copy link
Contributor Author

run-slow: gemma3n, gemma3

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/gemma3', 'models/gemma3n']
quantizations: [] ...

@manueldeprada
Copy link
Contributor Author

manueldeprada commented Jul 31, 2025

all gemma 3n tests passign!! Thanks a lot @ydshieh for the help!! this is ready to merge :)

(there is only a gemma3 custom test failing due to multiple gpus)

Unsure who to tag for review, lmk if I didnt hit the gemma3n experts :)

@huggingface huggingface deleted a comment from github-actions bot Jul 31, 2025
@@ -875,12 +859,13 @@ def test_model_1b_text_only(self):
@require_flash_attn
@require_torch_gpu
@pytest.mark.flash_attn_test
@unittest.skip("Timm models do not support Flash Attention 2 yet")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's delete the test or use the CausalLM

@@ -227,7 +227,7 @@ def __init__(
altup_num_inputs: int = 4,
num_kv_shared_layers: int = 15,
laurel_rank: int = 64,
activation_sparsity_pattern: Optional[Union[float, Sequence[float]]] = (0.95,) * 10 + (0.0,) * 25,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, agreed! Also imo having it a single float 0.95 isn;t very intuitive, we can default to None in signature and later if None: pattern = (0.95,) * 10 [....]

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants