Improve Gemma3n model and tests #39764

manueldeprada · 2025-07-29T17:58:29Z

Improves the Gemma3n model and tests by:

Remove hardcoded number of layers in the activation sparsity init.
Better explanation for layer reuse.
Enable and update integration tests.
Removing unused pan and scan configuration options from ImageProcessor.
Skipping some incompatible tests.

…ache-len-fix

…transformers into max-cache-len-fix

manueldeprada · 2025-07-29T17:59:34Z

src/transformers/models/gemma3n/configuration_gemma3n.py

@@ -227,7 +227,7 @@ def __init__(
        altup_num_inputs: int = 4,
        num_kv_shared_layers: int = 15,
        laurel_rank: int = 64,
-        activation_sparsity_pattern: Optional[Union[float, Sequence[float]]] = (0.95,) * 10 + (0.0,) * 25,


having the number of layers hardcoded is no good

yeah, agreed! Also imo having it a single float 0.95 isn;t very intuitive, we can default to None in signature and later if None: pattern = (0.95,) * 10 [....]

Agreed, I thought the same. The problem with doing that is that it changes the behavior of None, which people might rely on in the wild:

transformers/src/transformers/models/gemma3n/configuration_gemma3n.py

Lines 291 to 292 in 83f2599

if activation_sparsity_pattern is None:

activation_sparsity_pattern = [0.0] * num_hidden_layers

Maybe default to -1 or empty tuple?? Or you think it is safe to change None behaviour?

manueldeprada · 2025-07-29T18:00:33Z

tests/models/gemma3n/test_modeling_gemma3n.py

@@ -659,7 +658,6 @@ def test_automodelforcausallm(self):
            self.assertIsInstance(for_causal_lm, Gemma3nForCausalLM)


-@unittest.skip("Skipped for now!")


these tests were copied from gemma3 and were skipped. I updated and enabled them.

src/transformers/models/gemma3n/processing_gemma3n.py

HuggingFaceDocBuilderDev · 2025-07-29T18:32:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-07-30T08:22:40Z

This comment contains run-slow, running the specified jobs:

models: ['models/gemma3', 'models/gemma3n']
quantizations: [] ...

manueldeprada · 2025-07-30T19:20:42Z

run-slow: gemma3n, gemma3

github-actions · 2025-07-30T19:22:02Z

This comment contains run-slow, running the specified jobs:

models: ['models/gemma3', 'models/gemma3n']
quantizations: [] ...

manueldeprada · 2025-07-31T06:58:48Z

all gemma 3n tests passign!! Thanks a lot @ydshieh for the help!! this is ready to merge :)

(there is only a gemma3 custom test failing due to multiple gpus)

Unsure who to tag for review, lmk if I didnt hit the gemma3n experts :)

zucchini-nlp · 2025-07-31T12:01:02Z

tests/models/gemma3n/test_modeling_gemma3n.py

@@ -875,12 +859,13 @@ def test_model_1b_text_only(self):
    @require_flash_attn
    @require_torch_gpu
    @pytest.mark.flash_attn_test
+    @unittest.skip("Timm models do not support Flash Attention 2 yet")


then let's delete the test or use the CausalLM

zucchini-nlp · 2025-07-31T12:05:53Z

src/transformers/models/gemma3n/configuration_gemma3n.py

@@ -227,7 +227,7 @@ def __init__(
        altup_num_inputs: int = 4,
        num_kv_shared_layers: int = 15,
        laurel_rank: int = 64,
-        activation_sparsity_pattern: Optional[Union[float, Sequence[float]]] = (0.95,) * 10 + (0.0,) * 25,


yeah, agreed! Also imo having it a single float 0.95 isn;t very intuitive, we can default to None in signature and later if None: pattern = (0.95,) * 10 [....]

github-actions · 2025-07-31T19:41:35Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3n

manueldeprada and others added 12 commits July 28, 2025 18:42

fix gemma

40c604b

fix min

d143de4

fix quant init issue

404208a

Merge branch 'main' of github.com:huggingface/transformers into max-c…

8aff749

…ache-len-fix

fix gemma 3n

ee1fe17

Merge branch 'max-cache-len-fix' of https://github.com/manueldeprada/…

31b1bbe

…transformers into max-cache-len-fix

skip quant cache test

82a2c5f

fix modular

e4e6cc7

new test for Gemma

ffb2c61

include cyril change

e3ca2a3

gemma3n tests and code improvements

4243098

Merge branch main

83f2599

manueldeprada commented Jul 29, 2025

View reviewed changes

manueldeprada added 2 commits July 29, 2025 20:10

modular fix

501f651

opsie

49d52d7

manueldeprada commented Jul 29, 2025

View reviewed changes

src/transformers/models/gemma3n/processing_gemma3n.py Show resolved Hide resolved

manueldeprada added 3 commits July 29, 2025 20:59

modular

3f630b8

modular

b23e3ca

fix audio

7d79307

huggingface deleted a comment from github-actions bot Jul 30, 2025

manueldeprada added 6 commits July 30, 2025 12:06

fix test, remove sliding_window_pattern mention

afeca3b

add flash_attn pytest marks

b8f7f09

ops docstring

9d4ecb6

ops

8131164

add cleanup

dd77392

try to fix OOMs

c9ca022

manueldeprada and others added 2 commits July 30, 2025 19:57

fix modular

fbfa424

Merge branch 'main' into gemma3n-fixes

6247789

manueldeprada requested review from zucchini-nlp and ydshieh July 31, 2025 07:00

huggingface deleted a comment from github-actions bot Jul 31, 2025

manueldeprada requested a review from Cyrilvallez July 31, 2025 07:34

zucchini-nlp reviewed Jul 31, 2025

View reviewed changes

raushan review

8f4ce13

rausan review

3817ba4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Gemma3n model and tests #39764

Improve Gemma3n model and tests #39764

manueldeprada commented Jul 29, 2025 •

edited

Loading

Uh oh!

manueldeprada Jul 29, 2025

Uh oh!

zucchini-nlp Jul 31, 2025

Uh oh!

manueldeprada Jul 31, 2025 •

edited

Loading

Uh oh!

manueldeprada Jul 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

manueldeprada commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

manueldeprada commented Jul 31, 2025 •

edited

Loading

Uh oh!

zucchini-nlp Jul 31, 2025

Uh oh!

zucchini-nlp Jul 31, 2025

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

Uh oh!

	if activation_sparsity_pattern is None:
	activation_sparsity_pattern = [0.0] * num_hidden_layers

		@@ -659,7 +658,6 @@ def test_automodelforcausallm(self):
		self.assertIsInstance(for_causal_lm, Gemma3nForCausalLM)


		@unittest.skip("Skipped for now!")

Improve Gemma3n model and tests #39764

Are you sure you want to change the base?

Improve Gemma3n model and tests #39764

Conversation

manueldeprada commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manueldeprada Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

manueldeprada Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manueldeprada Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

manueldeprada commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

manueldeprada commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

Uh oh!

manueldeprada commented Jul 29, 2025 •

edited

Loading

manueldeprada Jul 31, 2025 •

edited

Loading

manueldeprada Jul 29, 2025 •

edited

Loading

manueldeprada commented Jul 31, 2025 •

edited

Loading