Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation (#40837)

ErfanBaghaei · ArminAzizi98 · cyyever · web-flow · commit 82ffeb28ad92 · 2025-10-08T13:37:51.000Z
* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -&gt; cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;
Co-authored-by: Joao Gante &lt;joaofranciscocardosogante@gmail.com&gt;

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit &lt;dsoul0621@gmail.com&gt;
Co-authored-by: Cyril Vallez &lt;cyril.vallez@gmail.com&gt;

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer &lt;amersinha@gmail.com&gt;
Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix TrainingArguments.parallelism_config NameError with accelerate&lt;1.10.1 (#40818)

Fix ParallelismConfig type for accelerate &lt; 1.10.1

Co-authored-by: Marc Sun &lt;57196510+SunMarc@users.noreply.github.com&gt;

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 &lt;213329731+greg-kwasniewski1@users.noreply.github.com&gt;

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez &lt;cyril.vallez@gmail.com&gt;
Co-authored-by: Cyril Vallez &lt;cyril.vallez@huggingface.co&gt;

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Improve typing

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Add comment

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Improve comments

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Revert typing

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Add Fast PromptDepthAnything Processor (#40602)

* Test &amp; import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan &lt;74535834+yonigozlan@users.noreply.github.com&gt;

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan &lt;74535834+yonigozlan@users.noreply.github.com&gt;

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan &lt;74535834+yonigozlan@users.noreply.github.com&gt;

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file &amp; comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan &lt;74535834+yonigozlan@users.noreply.github.com&gt;
Co-authored-by: yonigozlan &lt;yoni.gozlan@huggingface.co&gt;

* Fix deta loading &amp; dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu &lt;59462357+stevhliu@users.noreply.github.com&gt;

---------

Co-authored-by: Steven Liu &lt;59462357+stevhliu@users.noreply.github.com&gt;

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG &lt;71686691+4N3MONE@users.noreply.github.com&gt;
Co-authored-by: Yijun Lee &lt;119404328+yijun-lee@users.noreply.github.com&gt;

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu &lt;59462357+stevhliu@users.noreply.github.com&gt;

---------

Co-authored-by: YONGSANG &lt;71686691+4N3MONE@users.noreply.github.com&gt;
Co-authored-by: Yijun Lee &lt;119404328+yijun-lee@users.noreply.github.com&gt;
Co-authored-by: Steven Liu &lt;59462357+stevhliu@users.noreply.github.com&gt;

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt &lt;Rocketknight1@users.noreply.github.com&gt;

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb &lt;vaibhavs10@gmail.com&gt;
Co-authored-by: Steven Liu &lt;59462357+stevhliu@users.noreply.github.com&gt;

---------

Co-authored-by: vb &lt;vaibhavs10@gmail.com&gt;
Co-authored-by: Steven Liu &lt;59462357+stevhliu@users.noreply.github.com&gt;

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri &lt;93391238+MekkCyber@users.noreply.github.com&gt;

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri &lt;93391238+MekkCyber@users.noreply.github.com&gt;

---------

Co-authored-by: Mohamed Mekkouri &lt;93391238+MekkCyber@users.noreply.github.com&gt;

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri &lt;93391238+MekkCyber@users.noreply.github.com&gt;

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] &lt;github-actions[bot]@users.noreply.github.com&gt;

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng &lt;jiqing.feng@intel.com&gt;

* update cpu dockerfile

Signed-off-by: jiqing-feng &lt;jiqing.feng@intel.com&gt;

* update label name

Signed-off-by: jiqing-feng &lt;jiqing.feng@intel.com&gt;

---------

Signed-off-by: jiqing-feng &lt;jiqing.feng@intel.com&gt;

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka &lt;54212263+S1ro1@users.noreply.github.com&gt;

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style &amp;&amp; slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow &lt;rangehow@foxmail.com&gt;
Co-authored-by: github-actions[bot] &lt;github-actions[bot]@users.noreply.github.com&gt;
Co-authored-by: Marc Sun &lt;57196510+SunMarc@users.noreply.github.com&gt;

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna &lt;anna@liquid.ai&gt;
Co-authored-by: Anna Banaszak &lt;48625325+ankke@users.noreply.github.com&gt;

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka &lt;54212263+S1ro1@users.noreply.github.com&gt;

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu &lt;ubuntu@ip-172-31-27-253.ec2.internal&gt;

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --&gt; BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-168-30.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-103.ec2.internal&gt;
Co-authored-by: Lysandre &lt;hi@lysand.re&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-174-36.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-164-45.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-173-121.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-160-103.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-178.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-162-79.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-169-239.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-167-111.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-160-100.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-153.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-166-15.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-165-131.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-138.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-174-215.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-172-142.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-172-147.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-164-0.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-163-58.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-165-202.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-166-244.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-174-186.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-160-192.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-162-14.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-171-249.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-164-75.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-78.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-163-134.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-162-180.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-175-241.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-160-225.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-167-9.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-168-34.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-166-68.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-167-175.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-170-160.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-168-95.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-172-73.ec2.internal&gt;

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii &lt;qubvel@gmail.com&gt;

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii &lt;qubvel@gmail.com&gt;

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj &lt;lvyuanjun.lyj@alibaba-inc.com&gt;
Co-authored-by: Arthur &lt;arthur.zucker@gmail.com&gt;

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj &lt;lvyuanjun.lyj@alibaba-inc.com&gt;

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* More fixes

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* More fixes

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Format code

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante &lt;joaofranciscocardosogante@gmail.com&gt;

---------

Co-authored-by: Joao Gante &lt;joaofranciscocardosogante@gmail.com&gt;

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider &lt;yannick.schnider1@ibm.com&gt;

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;

* fix

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;
Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix conflict

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove &lt;frameworkcontent&gt; and &lt;pt&gt; tags from documentation (#41055)

* Remove &lt;frameworkcontent&gt; and &lt;pt&gt; tags

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Revert changes

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;
Co-authored-by: Joao Gante &lt;joaofranciscocardosogante@gmail.com&gt;

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi &lt;yi.a.wang@intel.com&gt;

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay &lt;raushan@huggingface.co&gt;

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix optional typing

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix schema typing

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix typing

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Format code

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Use np.ndarray

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Improve typing

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* More fixes

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix code

* Format

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* More fixes

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Add empty lines around code

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;
Co-authored-by: Yih-Dar &lt;2521628+ydshieh@users.noreply.github.com&gt;

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix comments

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix code

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun &lt;57196510+SunMarc@users.noreply.github.com&gt;

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt &lt;Rocketknight1@users.noreply.github.com&gt;

* Clear pass

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Clear pass

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

* Clear pass

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;
Co-authored-by: Matt &lt;Rocketknight1@users.noreply.github.com&gt;

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens，以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez &lt;cyril.vallez@gmail.com&gt;

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -&gt; cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;
Signed-off-by: greg-kwasniewski1 &lt;213329731+greg-kwasniewski1@users.noreply.github.com&gt;
Signed-off-by: jiqing-feng &lt;jiqing.feng@intel.com&gt;
Signed-off-by: Yannick Schnider &lt;yannick.schnider1@ibm.com&gt;
Signed-off-by: Wang, Yi &lt;yi.a.wang@intel.com&gt;
Co-authored-by: ArminAzizi98 &lt;147081650+ArminAzizi98@users.noreply.github.com&gt;
Co-authored-by: Yuanyuan Chen &lt;cyyever@outlook.com&gt;
Co-authored-by: Joao Gante &lt;joaofranciscocardosogante@gmail.com&gt;
Co-authored-by: Cyril Vallez &lt;cyril.vallez@huggingface.co&gt;
Co-authored-by: Mohamed Mekkouri &lt;93391238+MekkCyber@users.noreply.github.com&gt;
Co-authored-by: Yuchao Zhang &lt;418121364@qq.com&gt;
Co-authored-by: Anton Vlasjuk &lt;73884904+vasqu@users.noreply.github.com&gt;
Co-authored-by: Pavel Iakubovskii &lt;qubvel@gmail.com&gt;
Co-authored-by: Bo Zheng &lt;368586905@qq.com&gt;
Co-authored-by: bozheng-hit &lt;dsoul0621@gmail.com&gt;
Co-authored-by: Cyril Vallez &lt;cyril.vallez@gmail.com&gt;
Co-authored-by: Rémi Ouazan &lt;83456801+remi-or@users.noreply.github.com&gt;
Co-authored-by: Yoni Gozlan &lt;74535834+yonigozlan@users.noreply.github.com&gt;
Co-authored-by: Ryan Mullins &lt;ryanmullins@google.com&gt;
Co-authored-by: Amer &lt;amersinha@gmail.com&gt;
Co-authored-by: eustlb &lt;94853470+eustlb@users.noreply.github.com&gt;
Co-authored-by: Albert Villanova del Moral &lt;8515462+albertvillanova@users.noreply.github.com&gt;
Co-authored-by: Marc Sun &lt;57196510+SunMarc@users.noreply.github.com&gt;
Co-authored-by: Ákos Hadnagy &lt;akos@ahadnagy.com&gt;
Co-authored-by: Grzegorz Kwasniewski &lt;213329731+greg-kwasniewski1@users.noreply.github.com&gt;
Co-authored-by: NanoCode012 &lt;nano@axolotl.ai&gt;
Co-authored-by: Arthur &lt;48595927+ArthurZucker@users.noreply.github.com&gt;
Co-authored-by: 艾力可 &lt;178652170+thalahors@users.noreply.github.com&gt;
Co-authored-by: JJJYmmm &lt;92386084+JJJYmmm@users.noreply.github.com&gt;
Co-authored-by: Manuel de Prada Corral &lt;6536835+manueldeprada@users.noreply.github.com&gt;
Co-authored-by: Samuel Barry &lt;127697809+SamuelBarryCS@users.noreply.github.com&gt;
Co-authored-by: yonigozlan &lt;yoni.gozlan@huggingface.co&gt;
Co-authored-by: HyunZ118 &lt;156191095+HyunZ118@users.noreply.github.com&gt;
Co-authored-by: Steven Liu &lt;59462357+stevhliu@users.noreply.github.com&gt;
Co-authored-by: YONGSANG &lt;71686691+4N3MONE@users.noreply.github.com&gt;
Co-authored-by: Yijun Lee &lt;119404328+yijun-lee@users.noreply.github.com&gt;
Co-authored-by: Yih-Dar &lt;2521628+ydshieh@users.noreply.github.com&gt;
Co-authored-by: ydshieh &lt;ydshieh@users.noreply.github.com&gt;
Co-authored-by: Pablo Montalvo &lt;39954772+molbap@users.noreply.github.com&gt;
Co-authored-by: Shane A &lt;shanea@allenai.org&gt;
Co-authored-by: Xuehai Pan &lt;XuehaiPan@pku.edu.cn&gt;
Co-authored-by: Matt &lt;Rocketknight1@users.noreply.github.com&gt;
Co-authored-by: Raushan Turganbay &lt;raushan@huggingface.co&gt;
Co-authored-by: Aritra Roy Gosthipaty &lt;aritra.born2fly@gmail.com&gt;
Co-authored-by: vb &lt;vaibhavs10@gmail.com&gt;
Co-authored-by: Yaswanth Gali &lt;82788246+yaswanth19@users.noreply.github.com&gt;
Co-authored-by: Akshay Babbar &lt;priv.akshay@outlook.com&gt;
Co-authored-by: liangel-02 &lt;liangel@meta.com&gt;
Co-authored-by: Duc-Viet Hoang &lt;vietyb00@gmail.com&gt;
Co-authored-by: github-actions[bot] &lt;github-actions[bot]@users.noreply.github.com&gt;
Co-authored-by: jiqing-feng &lt;jiqing.feng@intel.com&gt;
Co-authored-by: lilin-1 &lt;256404019@qq.com&gt;
Co-authored-by: Matej Sirovatka &lt;54212263+S1ro1@users.noreply.github.com&gt;
Co-authored-by: Jack &lt;32371937+jackzhxng@users.noreply.github.com&gt;
Co-authored-by: Rangehow &lt;88258534+rangehow@users.noreply.github.com&gt;
Co-authored-by: rangehow &lt;rangehow@foxmail.com&gt;
Co-authored-by: Anna &lt;anna@liquid.ai&gt;
Co-authored-by: Anna Banaszak &lt;48625325+ankke@users.noreply.github.com&gt;
Co-authored-by: Hamish Scott &lt;41787553+hamishs@users.noreply.github.com&gt;
Co-authored-by: Harshal Janjani &lt;75426551+harshaljanjani@users.noreply.github.com&gt;
Co-authored-by: Branden &lt;brandenkmurray@gmail.com&gt;
Co-authored-by: Ubuntu &lt;ubuntu@ip-172-31-27-253.ec2.internal&gt;
Co-authored-by: Benjamin Bossan &lt;BenjaminBossan@users.noreply.github.com&gt;
Co-authored-by: Ita Zaporozhets &lt;31893021+itazap@users.noreply.github.com&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-168-30.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-103.ec2.internal&gt;
Co-authored-by: Lysandre &lt;hi@lysand.re&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-174-36.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-164-45.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-173-121.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-160-103.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-178.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-162-79.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-169-239.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-167-111.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-160-100.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-153.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-166-15.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-165-131.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-138.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-174-215.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-172-142.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-172-147.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-164-0.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-163-58.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-165-202.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-166-244.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-174-186.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-160-192.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-162-14.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-171-249.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-164-75.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-161-78.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-163-134.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-162-180.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-175-241.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-160-225.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-167-9.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-168-34.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-166-68.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-167-175.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-170-160.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-168-95.ec2.internal&gt;
Co-authored-by: ita.zaporozhets@huggingface.co &lt;ita_zaporozhets@ip-26-0-172-73.ec2.internal&gt;
Co-authored-by: StevenBucaille &lt;steven.bucaille@gmail.com&gt;
Co-authored-by: BakerBunker &lt;17872844+BakerBunker@users.noreply.github.com&gt;
Co-authored-by: lvyuanjun.lyj &lt;lvyuanjun.lyj@alibaba-inc.com&gt;
Co-authored-by: Arthur &lt;arthur.zucker@gmail.com&gt;
Co-authored-by: Ayush &lt;ayushtanwar1729@gmail.com&gt;
Co-authored-by: Ryan Mullins &lt;ryan@ryanmullins.org&gt;
Co-authored-by: Yannick Schnider &lt;Yannick.Schnider1@ibm.com&gt;
Co-authored-by: Ralph Gleaton &lt;70818603+rjgleaton@users.noreply.github.com&gt;
Co-authored-by: Saidur Rahman Pulok &lt;59414463+saidurpulok@users.noreply.github.com&gt;
Co-authored-by: Nick Doiron &lt;ndoiron@mapmeld.com&gt;
Co-authored-by: Wang, Yi &lt;yi.a.wang@intel.com&gt;
Co-authored-by: Duygu Altinok &lt;duygu.altinok12@gmail.com&gt;
Co-authored-by: Jinde.Song &lt;juude.song@gmail.com&gt;
Co-authored-by: hbenoit &lt;60629420+HaroldBenoit@users.noreply.github.com&gt;
Co-authored-by: nnul &lt;107971634+notkisk@users.noreply.github.com&gt;
Co-authored-by: YangKai0616 &lt;kai.yang@intel.com&gt;
Co-authored-by: Karol Szustakowski &lt;61427290+Szustarol@users.noreply.github.com&gt;
Co-authored-by: souvikku &lt;107592858+souvikku@users.noreply.github.com&gt;
diff --git a/docs/source/en/internal/generation_utils.md b/docs/source/en/internal/generation_utils.md
@@ -153,6 +153,9 @@ generation.
 [[autodoc]] TemperatureLogitsWarper
     - __call__
 
+[[autodoc]] TopHLogitsWarper
+    - __call__
+
 [[autodoc]] TopKLogitsWarper
     - __call__
 
diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
@@ -422,6 +422,7 @@
             "SynthIDTextWatermarkingConfig",
             "SynthIDTextWatermarkLogitsProcessor",
             "TemperatureLogitsWarper",
+            "TopHLogitsWarper",
             "TopKLogitsWarper",
             "TopPLogitsWarper",
             "TypicalLogitsWarper",
@@ -586,6 +587,7 @@
     from .generation import TemperatureLogitsWarper as TemperatureLogitsWarper
     from .generation import TextIteratorStreamer as TextIteratorStreamer
     from .generation import TextStreamer as TextStreamer
+    from .generation import TopHLogitsWarper as TopHLogitsWarper
     from .generation import TopKLogitsWarper as TopKLogitsWarper
     from .generation import TopPLogitsWarper as TopPLogitsWarper
     from .generation import TypicalLogitsWarper as TypicalLogitsWarper
diff --git a/src/transformers/generation/__init__.py b/src/transformers/generation/__init__.py
@@ -67,6 +67,7 @@
         "SuppressTokensAtBeginLogitsProcessor",
         "SynthIDTextWatermarkLogitsProcessor",
         "TemperatureLogitsWarper",
+        "TopHLogitsWarper",
         "TopKLogitsWarper",
         "TopPLogitsWarper",
         "TypicalLogitsWarper",
@@ -153,6 +154,7 @@
             SuppressTokensLogitsProcessor,
             SynthIDTextWatermarkLogitsProcessor,
             TemperatureLogitsWarper,
+            TopHLogitsWarper,
             TopKLogitsWarper,
             TopPLogitsWarper,
             TypicalLogitsWarper,
diff --git a/src/transformers/generation/configuration_utils.py b/src/transformers/generation/configuration_utils.py
@@ -165,6 +165,12 @@ class GenerationConfig(PushToHubMixin):
             Minimum token probability, which will be scaled by the probability of the most likely token. It must be a
             value between 0 and 1. Typical values are in the 0.01-0.2 range, comparably selective as setting `top_p` in
             the 0.99-0.8 range (use the opposite of normal `top_p` values).
+        top_h (`float`, *optional*):
+            Entropy budget scaling factor, which controls how much of the distribution’s entropy is preserved when sampling.
+            Must be a value between 0 and 1. At each step, tokens are sorted by probability, and the smallest prefix of tokens
+            is kept whose *renormalized* entropy is less than or equal to `top_h` times the entropy of the full distribution.
+            Smaller values (e.g., 0.2–0.5) lead to more focused, deterministic outputs, while values closer to 1.0 allow more
+            randomness and diversity. Typical values are in the 0.3–0.6 range.
         typical_p (`float`, *optional*, defaults to 1.0):
             Local typicality measures how similar the conditional probability of predicting a target token next is to
             the expected conditional probability of predicting a random token next, given the partial text already
@@ -354,6 +360,7 @@ def __init__(self, **kwargs):
         self.top_k = kwargs.pop("top_k", 50)
         self.top_p = kwargs.pop("top_p", 1.0)
         self.min_p = kwargs.pop("min_p", None)
+        self.top_h = kwargs.pop("top_h", None)
         self.typical_p = kwargs.pop("typical_p", 1.0)
         self.epsilon_cutoff = kwargs.pop("epsilon_cutoff", 0.0)
         self.eta_cutoff = kwargs.pop("eta_cutoff", 0.0)
@@ -578,6 +585,8 @@ def validate(self, strict=False):
                 minor_issues["top_p"] = greedy_wrong_parameter_msg.format(flag_name="top_p", flag_value=self.top_p)
             if self.min_p is not None:
                 minor_issues["min_p"] = greedy_wrong_parameter_msg.format(flag_name="min_p", flag_value=self.min_p)
+            if self.top_h is not None:
+                minor_issues["top_h"] = greedy_wrong_parameter_msg.format(flag_name="top_h", flag_value=self.top_h)
             if self.typical_p is not None and self.typical_p != 1.0:
                 minor_issues["typical_p"] = greedy_wrong_parameter_msg.format(
                     flag_name="typical_p", flag_value=self.typical_p
diff --git a/src/transformers/generation/logits_process.py b/src/transformers/generation/logits_process.py
@@ -581,6 +581,112 @@ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> to
         return scores_processed
 
 
+class TopHLogitsWarper(LogitsProcessor):
+    """
+    [`LogitsProcessor`] that implements Top-H sampling, a decoding method which adaptively selects a subset of
+    high-probability tokens based on entropy and cumulative probability constraints.
+
+    This method dynamically determines how many tokens to keep by analyzing the entropy difference of the selected
+    distribution, thereby balancing exploration and exploitation. It ensures that generated text maintains both
+    diversity and coherence.
+
+    Reference:
+    For details, see *Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation*
+    (NeurIPS 2025): https://arxiv.org/abs/2509.02510
+
+    Args:
+        top_h (`float`):
+            Scaling coefficient for the entropy-based threshold (`tau`). Must be in the range `(0, 1]`.
+
+        filter_value (`float`, *optional*, defaults to -inf):
+            All filtered values will be set to this float value.
+
+    Example:
+
+    ```python
+    >>> from transformers import AutoTokenizer, AutoModelForCausalLM
+
+    >>> model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
+    >>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
+
+    >>> inputs = tokenizer("A sequence: 1, 2", return_tensors="pt")
+
+    >>> outputs = model.generate(**inputs, do_sample=True, top_h=0.4)
+    >>> print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
+    A sequence: 1, 2, 3, 4, 5, 6, 7, 8, 9
+    ```
+    """
+
+    def __init__(self, top_h: float, filter_value: float = -float("Inf")):
+        super().__init__()
+
+        # input checks
+        if not (0 < top_h <= 1):
+            raise ValueError("`top_h` must be in the range (0, 1].")
+
+        # Maximum number of top tokens to consider before applying the entropy-based filter.
+        # Acts as a cap for efficiency and numerical stability — increasing this allows more
+        # tokens to be evaluated but may slow down generation. Default is 100.
+        self.top_n = 100
+
+        self.top_h = top_h
+        self.filter_value = filter_value
+
+    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
+        """
+        Filters logits using Top-H sampling.
+
+        Args:
+            input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
+                Input token IDs.
+            scores (`torch.FloatTensor` of shape `(batch_size, vocab_size)`):
+                Raw logits from the model.
+
+        Return:
+            `torch.FloatTensor` of shape `(batch_size, vocab_size)`:
+                Processed logits where invalid tokens are masked with `-inf`.
+        """
+        batch_size, vocab_size = scores.shape
+        device = scores.device
+        keep_mask = torch.zeros((batch_size, vocab_size), dtype=torch.bool, device=device)
+        top_n = min(self.top_n, vocab_size)
+
+        # 1. Get top-k logits and indices for the whole batch
+        top_logits, top_idx = torch.topk(scores, top_n, dim=-1, largest=True, sorted=True)
+
+        # 2. Create a batch of categorical distributions
+        dist = torch.distributions.Categorical(logits=top_logits)
+        probs = dist.probs
+        log_probs = torch.log(probs)  # dist.log_prob(idx)
+
+        # 3. Calculate the entropy-based threshold tau for the whole batch
+        # We unsqueeze tau to enable broadcasting against the cumulative entropy tensor.
+        tau = (dist.entropy() * self.top_h).unsqueeze(-1)
+
+        # 4. Calculate cumulative entropy using torch.cumsum
+        # The individual entropy terms (-p * log(p)) are calculated for all top_n tokens at once.
+        entropy_terms = -probs * log_probs
+        cumulative_entropy = torch.cumsum(entropy_terms, dim=-1)
+
+        # 5. Determine which tokens to keep based on the stopping condition
+        # Create a boolean mask for the top_n tokens.
+        # Stopping rule: keep adding tokens in order of probability until the cumulative entropy
+        # exceeds the threshold τ = H(p) * top_h. This ensures diversity (via entropy) while
+        # guaranteeing at least the most probable token is always included.
+        selection_mask = cumulative_entropy <= tau
+        selection_mask[:, 0] = True
+
+        # 6. Update the final keep_mask for the entire batch in one operation
+        # The scatter_ operation efficiently updates the keep_mask at the indices
+        # specified by top_idx with the boolean values from selection_mask.
+        keep_mask.scatter_(dim=1, index=top_idx, src=selection_mask)
+
+        # apply filtering
+        scores_processed = scores.clone()
+        scores_processed[~keep_mask] = self.filter_value
+        return scores_processed
+
+
 class MinPLogitsWarper(LogitsProcessor):
     """
     [`LogitsProcessor`] that performs min-p, i.e. keeps all tokens that are above a minimum probability, scaled by the
diff --git a/src/transformers/generation/utils.py b/src/transformers/generation/utils.py
@@ -93,6 +93,7 @@
     SuppressTokensAtBeginLogitsProcessor,
     SuppressTokensLogitsProcessor,
     TemperatureLogitsWarper,
+    TopHLogitsWarper,
     TopKLogitsWarper,
     TopPLogitsWarper,
     TypicalLogitsWarper,
@@ -1243,6 +1244,8 @@ def _get_logits_processor(
             # all samplers can be found in `generation_utils_samplers.py`
             if generation_config.temperature is not None and generation_config.temperature != 1.0:
                 processors.append(TemperatureLogitsWarper(generation_config.temperature))
+            if generation_config.top_h is not None:
+                processors.append(TopHLogitsWarper(top_h=generation_config.top_h))
             if generation_config.top_k is not None and generation_config.top_k != 0:
                 processors.append(
                     TopKLogitsWarper(top_k=generation_config.top_k, min_tokens_to_keep=min_tokens_to_keep)
diff --git a/tests/generation/test_logits_process.py b/tests/generation/test_logits_process.py
@@ -49,6 +49,7 @@
         SequenceBiasLogitsProcessor,
         SynthIDTextWatermarkLogitsProcessor,
         TemperatureLogitsWarper,
+        TopHLogitsWarper,
         TopKLogitsWarper,
         TopPLogitsWarper,
         TypicalLogitsWarper,
@@ -394,6 +395,95 @@ def test_top_p_dist_warper(self):
         # first batch should keep three tokens, second batch would keep only 1, but due to `min_tokens_to_keep=2` keeps 2.
         self.assertListEqual((filtered_dist != 0.0).to(torch.long).sum(dim=-1).tolist(), [3, 2])
 
+    def test_top_h_dist_warper(self):
+        """
+        We construct small distributions where the expected kept set is obvious for a given alpha.
+        We pass *log-probabilities* as "scores" so that softmax(scores) == original probabilities,
+        matching the style in other warper tests (e.g., MinP).
+        """
+
+        input_ids = None
+
+        # --- Case 1: Highly peaked distribution -> small alpha keeps only the top-1
+        dist1 = torch.log(
+            torch.tensor(
+                [[0.97, 0.01, 0.01, 0.01]],
+                device=torch_device,
+                dtype=torch.float,
+            )
+        )
+        top_h_warp = TopHLogitsWarper(top_h=0.3)
+        filtered_logits = top_h_warp(input_ids, dist1.clone())
+        filtered_dist = torch.exp(filtered_logits)  # exp(-inf) -> 0
+
+        EXPECTED1 = torch.tensor(
+            [[0.97, 0.0, 0.0, 0.0]],
+            device=torch_device,
+            dtype=torch.float,
+        )
+        torch.testing.assert_close(filtered_dist, EXPECTED1, rtol=1e-3, atol=1e-3)
+
+        # --- Case 2: Moderately skewed distribution -> alpha large enough to keep exactly top-2
+        dist2 = torch.log(
+            torch.tensor(
+                [[0.4, 0.3, 0.2, 0.1]],  # entropy budget with alpha=0.7 yields 2-token prefix
+                device=torch_device,
+                dtype=torch.float,
+            )
+        )
+        top_h_warp = TopHLogitsWarper(top_h=0.7)
+        filtered_logits = top_h_warp(input_ids, dist2.clone())
+        filtered_dist = torch.exp(filtered_logits)
+
+        EXPECTED2 = torch.tensor(
+            [[0.4, 0.3, 0.0, 0.0]],
+            device=torch_device,
+            dtype=torch.float,
+        )
+        torch.testing.assert_close(filtered_dist, EXPECTED2, rtol=1e-3, atol=1e-3)
+
+        # --- Case 3: Uniform distribution -> alpha=1.0 keeps all tokens
+        dist3 = torch.log(
+            torch.tensor(
+                [[0.25, 0.25, 0.25, 0.25]],
+                device=torch_device,
+                dtype=torch.float,
+            )
+        )
+        top_h_warp = TopHLogitsWarper(top_h=1.0)
+        filtered_logits = top_h_warp(input_ids, dist3.clone())
+        filtered_dist = torch.exp(filtered_logits)
+
+        EXPECTED3 = torch.tensor(
+            [[0.25, 0.25, 0.25, 0.25]],
+            device=torch_device,
+            dtype=torch.float,
+        )
+        torch.testing.assert_close(filtered_dist, EXPECTED3, rtol=1e-3, atol=1e-3)
+
+        # --- Case 4: Probabilities including 0 value
+        dist4 = torch.log(
+            torch.tensor(
+                [[0.75, 0.25, 0.0, 0.0]],
+                device=torch_device,
+                dtype=torch.float,
+            )
+        )
+        top_h_warp = TopHLogitsWarper(top_h=0.4)
+        filtered_logits = top_h_warp(input_ids, dist4.clone())
+        filtered_dist = torch.exp(filtered_logits)
+
+        EXPECTED4 = torch.tensor(
+            [[0.75, 0.0, 0.0, 0.0]],
+            device=torch_device,
+            dtype=torch.float,
+        )
+        torch.testing.assert_close(filtered_dist, EXPECTED4, rtol=1e-3, atol=1e-3)
+        # Processor should not change logits in-place
+        top_h_warp = TopHLogitsWarper(top_h=0.5)
+        out_again = top_h_warp(input_ids, dist3)
+        assert not torch.all(out_again == dist3)
+
     def test_min_p_dist_warper(self):
         input_ids = None
         vocab_size = 10
diff --git a/tests/generation/test_utils.py b/tests/generation/test_utils.py
@@ -3030,6 +3030,34 @@ def test_synthid_text_watermark_generation_mean_expected_bias(self):
         )
         self.assertTrue(torch.all(is_close))
 
+    @slow
+    def test_TopH_example_integration(self):
+        tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B")
+        model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B")
+        tokenizer.pad_token = tokenizer.eos_token
+        model.config.pad_token_id = tokenizer.pad_token_id
+        encoder_input_str = "Tell me a joke about a monkey."
+        input_ids = tokenizer(encoder_input_str, return_tensors="pt")
+
+        torch.manual_seed(0)
+
+        outputs = model.generate(
+            **input_ids,
+            eos_token_id=model.config.eos_token_id,
+            do_sample=True,
+            temperature=1.0,
+            top_h=0.4,
+            max_new_tokens=32,
+            pad_token_id=tokenizer.pad_token_id,
+        )
+        outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
+        self.assertListEqual(
+            outputs,
+            [
+                'Tell me a joke about a monkey. Why did the monkey go to the doctor? Because he was feeling a little "tropic"!'
+            ],
+        )
+
     @slow
     def test_beam_search_example_integration(self):
         # exactly the example provided in the docstrings of beam search, which previously