Commit 82ffeb2
Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation (#40837)
* init
* added TopH
* Update TopH logits_process.py
* Update logits_process.py
* Update test_logits_process.py
* Update test_logits_process.py
* added test No. 4
* Resolving __init__.py issues
* Resolving configuration_utils.py Issues
* Resolving logits_process.py Issues
* Resolving utils.py Issues
* Resolving test_logits_process.py Issues
* Resolving __init__.py issues
* Resolving logits_process.py Issues
* Resolving __init__.py issues
* Updated Docs
* Updated Docstring
* style: autoformat with make fixup
* Fixing Docstring
* Update logits_process.py removed defaults
* Variable H name -> cumulative_entropy
* Using torch.distributions.Categorical
* Improve torch_dtype checks (#40808)
* Improve torch_dtype checks
Signed-off-by: Yuanyuan Chen <[email protected]>
* Apply suggestions from code review
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
* Add VideoProcessors to auto-backend requirements (#40843)
* add it
* fix existing ones
* add perception to auto_mapping...
* Adds Causal Conv 1D kernel for mamba models (#40765)
* add kernel
* make style
* keep causal-conv1d
* small fix
* small fix
* fix modular converter
* modular fix + lazy loading
* revert changes modular
* nit
* hub kernels update
* update
* small nit
* Update no split modules in T5Gemma model (#40810)
* Update no split modules in T5Gemma model
* Update no_split_modules also for T5Gemma modular
* Remove model_split_percents from test cases
---------
Co-authored-by: Anton Vlasjuk <[email protected]>
* Replace image classification loss functions to `self.loss_function` (#40764)
* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)
* align torch implementation of gdn with fla.
* fix fla import.
* fix
* remove unused attr
* fixes
* strictly align l2norm in Qwen3-Next with FLA implementation.
---------
Co-authored-by: bozheng-hit <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
* Fixes for continuous batching (#40828)
* Fix for CB attn mask and refactor
* Tests for CB (not all passing)
* Passing tests and a logger fix
* Fixed the KV metrics that were broken when we moved to hybrid alloc
* Fix circular import and style
* Added tests for FA
* Unfolded test to have device expectations
* Fixes for H100
* more fixes for h100
* H100 are good
* Style
* Adding some comments from #40831
* Rename test
* Avoid 1 letter variables
* Dictonnary is only removed during kwargs
* Test for supported sample
* Fix a unvoluntary slice
* Fixes for non-sliced inputs and small example improvments
* Slice inputs is more understandabe
* Style
* [tests] re-enable aria fast tests (#40846)
* rise from the dead
* test
* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)
* Fix inconsistencies with box input inference with original repo
* remove print
* always pad
* fix modular
* [Sam2Video] Fix video inference with batched boxes and add test (#40797)
fix video inference with batched boxes and add test
* add: differential privacy research model (#40851)
* VaultGemma
* Removing Sequence and Token classification models. Removing integration tests for now
* Remove pass-only modular code. style fixes
* Update vaultgemma.md
* Update docs/source/en/model_doc/vaultgemma.md
Co-authored-by: Anton Vlasjuk <[email protected]>
* Update docs/source/en/model_doc/vaultgemma.md
Co-authored-by: Anton Vlasjuk <[email protected]>
* Add links to model doc
* Correct model doc usage examples
* Updating model doc to describe differences from Gemma 2
* Update model_doc links
* Adding integration tests
* style fixes
* repo consistency
* attribute exception
---------
Co-authored-by: Amer <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>
* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)
* ouput_attentions in typed kwargs
* correct typing in GenericForTokenClassification
* improve
* [tests] move generative tests away from `test_modeling_common.py` (#40854)
move tests
* [generate] Always use decoder config to init cache (#40772)
* mega derp
* fix
* always use the decoder
* Use checkpoint in auto_class_docstring (#40844)
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)
Fix ParallelismConfig type for accelerate < 1.10.1
Co-authored-by: Marc Sun <[email protected]>
* Redirect MI355 CI results to dummy dataset (#40862)
* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)
Signed-off-by: greg-kwasniewski1 <[email protected]>
* [docstrings / type hints] Update outdated annotations for `past_key_values` (#40803)
* some fixes
* nits
* indentation
* indentation
* a bunch of type hints
* bulk changes
* fix florence kwargs (#40826)
* fix: XIELU act parameters not being casted to correct dtype (#40812)
* Update model tags and integration references in bug report (#40881)
* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)
use numerically stable inverse
* Adding Support for Qwen3-VL Series (#40795)
* add qwen3vl series
* make fixup
* fix import
* re-protect import
* fix it finally (need to merge main into the branch)
* skip processor test (need the checkpoint)
* oups typo
* simplify modular
* remove unecesary attr
* fix layer
* remove unused rope_deltas args
* reuse image def
* remove unnesesary imports
---------
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
* [`VaultGemma`] Update expectations in integration tests (#40855)
* fix tests
* style
* Fix modular consistency (#40883)
* reapply modular
* add missing one
* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)
* move checks to validate steps where possible
* fix csm and other models that override _sample
* ops dia you again
* opsie
* joao review
* Move variable output controls to `prepare_inputs_for_generation`
* fix a bunch of models
* back to basics
* final touches
* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)
* Correctly pass is_causal in sdpa_attention_paged_forward
Signed-off-by: Yuanyuan Chen <[email protected]>
* Improve typing
Signed-off-by: Yuanyuan Chen <[email protected]>
* Add comment
Signed-off-by: Yuanyuan Chen <[email protected]>
* Improve comments
Signed-off-by: Yuanyuan Chen <[email protected]>
* Revert typing
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Use torch.expm1 and torch.log1p for better numerical results (#40860)
Signed-off-by: Yuanyuan Chen <[email protected]>
* Add Fast PromptDepthAnything Processor (#40602)
* Test & import setup
* First version passing tests
* Ruff
* Dummy post processing
* Add numerical test
* Adjust
* Doc
* Ruff
* remove unused arg
* Refine interpolation method and push test script
* update bench
* Comments
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Yoni Gozlan <[email protected]>
* Remove benchmrk script
* Update docstrings
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py
Co-authored-by: Yoni Gozlan <[email protected]>
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py
Co-authored-by: Yoni Gozlan <[email protected]>
* doc
* further process kwargs
* remove it
* remove
* Remove to dict
* remove crop middle
* Remove param specific handling
* Update testing logic
* remove ensure multiple of as kwargs
* fix formatting
* Remove none default and get image size
* Move stuff to _preprocess_image_like_inputs and refacto
* Clean
* ruff
* End of file & comments
* ruff again
* Padding fixed
* Remove comments to pass tests
* Remove prompt depth from kwargs
* Adjust output_size logic
* Docstring for preprocess
* auto_docstring for preprocess
* pass as an arg
* update test batched
* stack images
* remove prompt scale to meter
* return tensors back in preprocess
* remove copying of images
* Update behavior to match old processoer
* Fix batch size of tests
* fix test and fast
* Fix slow processor
* Put tests back to pytorch
* remove check and modify batched tests
* test do_pad + slow processor fix
---------
Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
* Fix deta loading & dataclass (#40878)
* fix
* fix 2
* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)
Remove dict branch of attention_mask
Signed-off-by: Yuanyuan Chen <[email protected]>
* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)
* fix: manual edits
* Apply suggestions from code review
* Update docs/source/ko/model_doc/smolvlm.md
* Update docs/source/ko/model_doc/smolvlm.md
* Update docs/source/ko/model_doc/smolvlm.md
* Update docs/source/ko/model_doc/smolvlm.md
* Update docs/source/ko/_toctree.yml
Co-authored-by: Steven Liu <[email protected]>
---------
Co-authored-by: Steven Liu <[email protected]>
* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)
* feat: manual translation
* docs: fix ko/_toctree.yml
* Apply suggestions from code review
Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
* Update docs/source/ko/image_processors.md
Co-authored-by: Steven Liu <[email protected]>
---------
Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
* [generate] remove docs of a feature that no longer exists (#40895)
* Make debugging failing tests (check and update expect output values) easier 🔥 (#40727)
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Fixing the call to kernelize (#40628)
* fix
* style
* overload train and eval
* add getter and setter
* Fix getter regression (#40824)
* test things
* style
* move tests to a sane place
* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* [cache] Merge static sliding and static chunked layer (#40893)
* merge
* get rid of tensors in get_mask_sizes!!
* remove branch
* add comment explanation
* re-add the class with deprecation cycle
* Harmonize CacheLayer names (#40892)
* unify naming
* style
* doc as well
* post rebase fix
* style
* style
* revert
* [cache] Only use scalars in `get_mask_sizes` (#40907)
* remove tensor ops
* style
* style
* Set seed for `Glm4vIntegrationTest` (#40905)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Add Olmo3 model (#40778)
* transformers add-new-model-like for Olmo3
* Implement modular Olmo3
* Update Olmo3 tests
* Copy Olmo2 weight converter to Olmo3
* Implement Olmo3 weight converter
* Fix code quality errors
* Remove unused import
* Address rope-related PR comments
* Update Olmo3 model doc with minimal details
* Fix Olmo3 rope test failure
* Fix 7B integration test
* remove dummy EncodingFast (#40864)
Signed-off-by: Yuanyuan Chen <[email protected]>
* Improve module name handling for local custom code (#40809)
* Improve module name handling for local custom code
* Use `%lazy` in logging messages
* Revert "Use `%lazy` in logging messages"
This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.
* Add notes for sanitization rule in docstring
* Remove too many underscores
* Update src/transformers/dynamic_module_utils.py
* Update src/transformers/dynamic_module_utils.py
---------
Co-authored-by: Matt <[email protected]>
* Remove `runner_map` (#40880)
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* disable `test_fast_is_faster_than_slow` (#40909)
fix
Co-authored-by: ydshieh <[email protected]>
* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)
* gemma3vision compatible with assisted generation
* docstring
* BC
* docstring
* failing checks
* make fixup
* apply changes to modular
* misc fixes
* is_initialized
* fix poor rebase
* [generate] misc fixes (#40906)
misc fixes
* 🔴Make `center_crop` fast equivalent to slow (#40856)
make center_crop fast equivalent to slow
* Fix dtype in Paligemma (#40912)
* fix dtypes
* fix copies
* delete unused attr
* [Docs] Adding documentation of MXFP4 Quantization (#40885)
* adding mxfp4 quantization docs
* review suggestions
* Apply suggestions from code review
Co-authored-by: vb <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
---------
Co-authored-by: vb <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
* Processor load with multi-processing (#40786)
push
* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)
* Remove unused arg
* deprecate
* revrt one change
* get set go
* version correction
* fix
* make style
* comment
* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)
* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)
* chore: fix code formatting and linting issues
* refactor: move UMT5 GGUF test to quantization directory and clean up comments
* chore: trigger CI pipeline
* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.
* Add regression check to UMT5 encoder GGUF test
Verify encoder output against reference tensor values with appropriate tolerances for stability.
* Update tests/quantization/ggml/test_ggml.py
Co-authored-by: Mohamed Mekkouri <[email protected]>
* Update tests/quantization/ggml/test_ggml.py
remove comments
Co-authored-by: Mohamed Mekkouri <[email protected]>
---------
Co-authored-by: Mohamed Mekkouri <[email protected]>
* [torchao safetensors] renaming get_state_dict function (#40774)
renaming get_state_dict function
Co-authored-by: Mohamed Mekkouri <[email protected]>
* Adding activation kernels (#40890)
* first commit
* add mode
* revert modeling
* add compile
* rm print
* Minor fix for #40727 (#40929)
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Add support for Florence-2 training (#40914)
* Support training florence2
* update doc and testing model to florence-community
* fix florence-2 test, use head dim 16 instead of 8 for fa2
* skip test_sdpa_can_dispatch_on_flash
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add LongCat-Flash (#40730)
* working draft for LongCat
* BC changes to deepseek_v3 for modular
* format
* various modularities
* better tp plan
* better init
* minor changes
* make modular better
* clean up patterns
* Revert a couple of modular commits, because we won't convert in the end
* make things explicit.
* draft test
* toctree, tests and imports
* drop
* woops
* make better things
* update test
* update
* fixes
* style and CI
* convert stuff
* up
* ah, yes, that
* enable gen tests
* fix cache shape in test (sum of 2 things)
* fix tests
* comments
* re-Identitise
* minimize changes
* better defaults
* modular betterment
* fix configuration, add documentation
* fix init
* add integration tests
* add info
* simplify
* update slow tests
* fix
* style
* some additional long tests
* cpu-only long test
* fix last tests?
* urg
* cleaner tests why not
* fix
* improve slow tests, no skip
* style
* don't upcast
* one skip
* finally fix parallelism
* [DOC] Add missing dates in model cards (#40922)
add missing dates
* [models] remove unused `import torch.utils.checkpoint` (#40934)
* Intel CPU dockerfile (#40806)
* upload intel cpu dockerfile
Signed-off-by: jiqing-feng <[email protected]>
* update cpu dockerfile
Signed-off-by: jiqing-feng <[email protected]>
* update label name
Signed-off-by: jiqing-feng <[email protected]>
---------
Signed-off-by: jiqing-feng <[email protected]>
* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)
* Fix trainer tests (#40823)
* fix liger
* fix
* more
* fix
* fix hp
* fix
---------
Co-authored-by: Matej Sirovatka <[email protected]>
* Fix `Glm4vMoeIntegrationTest` (#40930)
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Raise error instead of warning when using meta device in from_pretrained (#40942)
* raise instead of warning
* add timm
* remove
* Consistent naming for images kwargs (#40834)
* use consistent naming for padding
* no validation on pad size
* add warnings
* fix
* fox copies
* another fix
* fix some tests
* fix more tests
* fix lasts tests
* fix copies
* better docstring
* delete print
* Remove nested import logic for torchvision (#40940)
* remove nested import logic for torchvision
* remove unnecessary protected imports
* remove unnecessarry protected import in modular (and modeling)
* fix wrongly remove protected imports
* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Update expected values for some `test_speculative_generation` (#40949)
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Standardize audio embedding function name for audio multimodal models (#40919)
* Standardize audio embedding function name for audio multimodal models
* PR review
* Add FlexOlmo model (#40921)
* transformers add-new-model-like
* Add FlexOlmo implementation
* Update FlexOlmo docs
* Set default tokenization for flex olmo
* Update FlexOlmo tests
* Update attention comment
* Remove unneeded use of `sliding_window`
* Don't list dropout in eager_paged_attention_forward (#40924)
Remove dropout argument
Signed-off-by: Yuanyuan Chen <[email protected]>
* Update expected values for one more `test_speculative_generation` after #40949 (#40967)
fix
Co-authored-by: ydshieh <[email protected]>
* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)
* fix(trainer): ensure final checkpoint is saved when resuming training
* add test
* make style && slight fix of test
* make style again
* move test code to test_trainer
* remove outdated test file
* Apply style fixes
---------
Co-authored-by: rangehow <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <[email protected]>
* Add new model LFM2-VL (#40624)
* Add LFM2-VL support
* add tests
* linting, formatting, misc review changes
* add siglip2 to auto config and instantiate it in lfm2-vl configuration
* decouple image processor from processor
* remove torch import from configuration
* replace | with Optional
* remove layer truncation from modeling file
* fix copies
* update everything
* fix test case to use tiny model
* update the test cases
* fix finally the image processor and add slow tests
* fixup
* typo in docs
* fix tests
* the doc name uses underscore
* address comments from Yoni
* delete tests and unsuffling
* relative import
* do we really handle imports better now?
* fix test
* slow tests
* found a bug in ordering + slow tests
* fix copies
* dont run compile test
---------
Co-authored-by: Anna <[email protected]>
Co-authored-by: Anna Banaszak <[email protected]>
* Fix outdated version checks of accelerator (#40969)
* Fix outdated version checks of accelerator
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix outdated version checks of accelerator
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)
use skip_predictor in vjepa2 `get_vision_features`
* [Trainer] Fix DP loss (#40799)
* fix
* style
* Fix fp16
* style
---------
Co-authored-by: Matej Sirovatka <[email protected]>
* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)
* fix(timm): Add exception handling for unknown Gemma3n model
* nit: Let’s cater to this specific issue
* nit: Simplify error handling
* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)
* fix merge conflicts
* change token typing
---------
Co-authored-by: Ubuntu <[email protected]>
* [tests] Really use small models in all fast tests (#40945)
* start
* xcodec
* chameleon
* start
* layoutlm2
* layoutlm
* remove skip
* oups
* timm_wrapper
* add default
* doc
* consistency
* Add captured actual outputs to CI artifacts (#40965)
* fix
* fix
* Remove `# TODO: ???` as it make me `???`
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Revert change in `compile_friendly_resize` (#40645)
fix
* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Using torch.distributions.Categorical
* Remove `set_model_tester_for_less_flaky_tests` (#40982)
remove
* Benchmarking v2 GH workflows (#40716)
* WIP benchmark v2 workflow
* Container was missing
* Change to sandbox branch name
* Wrong place for image name
* Variable declarations
* Remove references to file logging
* Remove unnecessary step
* Fix deps install
* Syntax
* Add workdir
* Add upload feature
* typo
* No need for hf_transfer
* Pass in runner
* Runner config
* Runner config
* Runner config
* Runner config
* Runner config
* mi325 caller
* Name workflow runs properly
* Copy-paste error
* Add final repo IDs and schedule
* Review comments
* Remove wf params
* Remove parametrization from worfkflow files
* Fix callers
* Change push trigger to pull_request + label
* Add back schedule event
* Push to the same dataset
* Simplify parameter description
* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)
* clean start to bert refactor
* some test fixes
* style
* fix last tests
* be strict on positional embeddings, fixup according tests
* cache support
* more cache fixes, new causal API
* simplify masks, fix tests for gen
* flex attn, static cache support, round of fixes
* ?
* this time
* style
* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)
* roberta
* fixup sdpa remains
* attention split, simplify args and kwargs, better typing
* fix encoder decoder
* fix test
* modular roberta
* albert
* data2vectext, making it modular tomorrow
* modular data2vec text
* tmp disable
* xmod + cache position fixes
* whoops
* electra + markuplm, small fixes
* remove wrong copy
* xlm_roberta + some embedding fixes
* roberta prelayernorm
* RemBert: remove copy, maybe doing it later
* ernie
* fix roberta offloading
* camembert
* copy fixes
* bert generation + fixes on eager
* xlm roberta xl
* bridgetower (text) + seamlessv2 copy fixes
* rocbert + small fixes
* whoops
* small round of fixups
* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)
* the end of the tunnel?
* fixup nllbmoe + style
* we dont need this anymore
* megatron bert is barely used, low prio skip for now
* Modernize bert (template for others)
NOTE: trying to push this through, might be overdue if not in time possible
* check inputs for all others (if checkmarked)
* fix bridgetower
* style
* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)
* proper fix for bert to force intermediate dict outputs
* propagate to others
* style
* xlm roberta xl investigation, its the layernorm...
* mobile bert
* revert this, might cause issues with composed models
* review
* style
* Remove [[autodoc]] refs to TF/Flax objects (#40996)
* remove refs
* more
* ENH: Enable readline support for transformers chat (#40911)
ENH Enable readline support for chat
This small change enables GNU readline support for the transformers chat
command. This includes, among others:
- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l
Implementation
Although it may look strange, just importing readline is enough to
enable it in Python, see:
https://docs.python.org/3/library/functions.html#input
As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.
Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).
* [testing] test `num_hidden_layers` being small in model tester (#40992)
fix
Co-authored-by: ydshieh <[email protected]>
* blt wip (#38579)
* blt wip
* cpu version
* cpu friendly with full entropy model (real time patching)
* adding config file instead of args file
* enable MPS
* refactoring unused code
* single config class in config file
* inherit from PreTrainedModel
* refactor LMTransformer --> BLTPatcher
* add conversion script
* load from new checkpoing with form_pretrained
* fixed demo from_pretrained
* clean up
* clean a few comments
* cleanup folder
* clean up dir
* cleaned up modeling further
* rename classes
* adding transformers Attention class and RotaryEmbedding class
* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc
* seperate out patcher config, update modeling and conversion script
* rename vars to be more transformers-like
* rm unused functions
* adding cross attention from transformers
* pass arg
* rename weights
* updated conversion script
* overwritten commit! fixing PR
* apply feedback
* adding BLTRMSNorm like Llama
* add repeat_kv and eager_attention_forward copied from
* BLTMLP identical to MllamTextMLP
* clean up some args'
* more like mllama, but busier inits
* BLTTransformerLayer config
* decoder, encoder, global configs
* wip working on modular file
* cleaning up patch and configs
* clean up patcher helpers
* clean up patcher helpers further
* clean up
* some config renaming
* clean up unused configs
* clean up configs
* clean up configs
* update modular
* clean
* update demo
* config more like mllama, seperated subconfigs from subdicts
* read from config instead of self args
* update demo file
* model weights to causal lm weights
* missed file
* added tied weights keys
* BLTForCausalLM
* adding files after add-new-model-like
* update demo
* working on tests
* first running integration tests
* added integration tests
* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff
* tokenizer clean up
* modular file
* fixing rebase
* ruff
* adding correct basemodel output and updating config with checkpoint vals (for testing)
* BLTModelTests git status
* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic
* fix sdpa == causal tests
* fix small model test and some gradient checkpointing
* skip training GC tests
* fix test
* updated modular
* update modular
* ruff
* adding modular + modeling
* modular
* more modern is_casual check
* cleaning up modular
* more modular reduction
* ruff
* modular fix
* fix styling
* return 2
* return 2
* fix some tests
* fix bltcrossattention after modular break
* some fixes / feedback
* try cache generate fix
* try cache generate fix
* fix generate tests
* attn_impl workaround
* refactoring to use recent TransformersKwargs changes
* fix hidden_states shape test
* refactor to new outputs
* simplify outputs a bit
* rm unneeded decoderlayer overwriting
* rename blt
* forgot tokenizer test renamed
* Reorder
* Reorder
* working on modular
* updates from modular
* new modular
* ruff and such
* update pretrainedmodel modular
* using cohere2 apply_rotary_pos_emb
* small changes
* apply feedback r2
* fix cross_attention
* apply more feedback
* update modeling fix
* load submodules from pretrainedmodel
* set initializer_range to subconfigs
* rm cross_attnetion_states pass when not needed
* add 7b projection layer support
* check repo
* make copies
* lost cohere2 rotate_half
* ruff
* copies?
* don't tie weights for submodules
* tie weights setting
* check docstrings
* apply feedback
* rebase
* rebased modeling
* update docs
* applying feedback
* few more fixes
* fix can_record_outputs
* fast tokenizer
* no more modulelist
* tok auto
* rm tokenizersss
* fix docs
* ruff
* fix after rebase
* fix test, configs are not subscriptable
---------
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
* [docs] rm stray tf/flax autodocs references (#40999)
rm tf references
* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)
* fix
* fixup inits
* oops
* fixup gemma
* fixup modular order
* how does this keep happen lol
* vaultgemma is new i forgot
* remove init check
* Make `EfficientLoFTRModelTest` faster (#41000)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Fix typoes in src and tests (#40845)
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)
* Fix model cards and modalities in toctree
* fix new models
* RUFF fix on CI scripts (#40805)
Signed-off-by: Yuanyuan Chen <[email protected]>
* fix dict like init for ModelOutput (#41002)
* fix dict like init
* style
* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)
remove old type aliases
* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)
* update test (and overwrites)
* better test comment
* 0 as a default for
* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)
fix
Co-authored-by: ydshieh <[email protected]>
* 🚨 [v5] remove deprecated entry point (#40997)
* remove old entry point
* update references to transformers-cli
* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)
* fix: bug that made early stop change order of matches
* fix: applied code suggestion
Co-authored-by: Pavel Iakubovskii <[email protected]>
* fix: applied code suggestion to modular
* fix: integration tests
---------
Co-authored-by: Pavel Iakubovskii <[email protected]>
* Fix `PhimoeIntegrationTest` (#41007)
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Fix Glm4v test (#41011)
fix
* Update after #41007 (#41014)
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Fix benchmark runner argument name (#41012)
* Adding support for Qwen3Omni (#41025)
* Add Qwen3Omni
* make fix-copies, import properly
* nit
* fix wrong setup. Why was audio_token_id renamed ?
* upds
* more processing fixes
* yup
* fix more generation tests
* down to 1?
* fix import issue
* style, update check repo
* up
* fix quality at my best
* final quality?
* fix doc building
* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE
* SKIP THE TEMPLATE ONE
---------
Co-authored-by: lvyuanjun.lyj <[email protected]>
Co-authored-by: Arthur <[email protected]>
* Making compute_loss_func always take priority in Trainer (#40632)
* logger warn, if-else logic improved
* redundant if condition fix
* Modify Qwen3Omni parameter name since VL changed it (#41045)
Modify parameter name since VL changed it
Co-authored-by: lvyuanjun.lyj <[email protected]>
* Fix Qwen video tests (#41049)
fix test
* [testing] Fix `qwen2_audio` (#41018)
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <[email protected]>
* Fix typing of tuples (#41028)
* Fix tuple typing
Signed-off-by: Yuanyuan Chen <[email protected]>
* More fixes
Signed-off-by: Yuanyuan Chen <[email protected]>
* More fixes
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Remove optax (#41030)
Remove optax dep
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix typos in English/Chinese documentation (#41031)
* Fix typos and formatting in English docs
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix typos and formatting in Chinese docs
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Use torch.autocast (#40975)
* Use torch.autocast
Signed-off-by: Yuanyuan Chen <[email protected]>
* Format code
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* docs: improved RoPE function Docstrings (#41004)
* docs: improved RoPE functuon docstrings
* Update src/transformers/modeling_rope_utils.py
Co-authored-by: Joao Gante <[email protected]>
---------
Co-authored-by: Joao Gante <[email protected]>
* Fix condition for emitting warning when generation exceeds max model length (#40775)
correct warning when generation exceeds max model length
Signed-off-by: Yannick Schnider <[email protected]>
* Fix outdated torch version check (#40925)
Update torch minimum version check to 2.2
Signed-off-by: Yuanyuan Chen <[email protected]>
* Remove doc of tf and flax (#41029)
Signed-off-by: Yuanyuan Chen <[email protected]>
* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)
* Add whole word masking
* Vectorize whole word masking functions
* Unit test whole word masking
* Remove support for TF in whole word masking
* [testing] Fix `seed_oss` (#41052)
* fix
* fix
* fix
* fix
* fix
* fix
* Update tests/models/seed_oss/test_modeling_seed_oss.py
Co-authored-by: Anton Vlasjuk <[email protected]>
* fix
---------
Co-authored-by: ydshieh <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>
* Remove repeated import (#40937)
* Remove repeated import
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix conflict
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Simplify unnecessary Optional typing (#40839)
Remove Optional
Signed-off-by: Yuanyuan Chen <[email protected]>
* Add write token for uploading benchmark results to the Hub (#41047)
* Separate write token for Hub upload
* Address review comments
* Address review comments
* Ci utils (#40978)
* Add CI reports dir to gitignore
* Add utils to run local CI
* Review compliance
* Style
* License
* Remove <frameworkcontent> and <pt> tags from documentation (#41055)
* Remove <frameworkcontent> and <pt> tags
Signed-off-by: Yuanyuan Chen <[email protected]>
* Revert changes
Signed-off-by: Yuanyuan Chen <[email protected]>
* Update docs/source/en/model_doc/madlad-400.md
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
* Fix CI jobs being all red 🔴 (false positive) (#41059)
fix
Co-authored-by: ydshieh <[email protected]>
* Update quantization CI (#41068)
* fix
* new everything
* fix
* [i18n-bn] Add Bengali language README file (#40935)
* [i18n-bn] Add Bengali language README file and update links in existing language files
* Update Bengali README for clarity and consistency in model descriptions
* Improve documentation and errors in Mamba2-based models (#41063)
* fix bug in Mamba2 docs
* correct 'because on of' issue
* link to other Mamba2 model types
* github URL is not changed
* update error message in generated files
* Update team member list for some CI workflows (#41094)
* update list
* update list
---------
Co-authored-by: ydshieh <[email protected]>
* fix crash when using chat to send 2+ request to gptoss (#40536)
Signed-off-by: Wang, Yi <[email protected]>
* Minor addition, no split modules for VideoMAEE (#41051)
* added no split modules
* fixed typo
---------
Co-authored-by: Raushan Turganbay <[email protected]>
* Switch to `python:3.10-slim` for CircleCI docker images (#41067)
fix
Co-authored-by: ydshieh <[email protected]>
* Fix argument name in benchmarking script (#41086)
* Fix argument name in benchmarking script
* Adjust vars
* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)
Remove mention of TensorFlow from English documentation
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix typos in documentation (#41087)
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix typing (#40788)
* Fix optional typing
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix optional typing
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix schema typing
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix typing
* Fix typing
* Fix typing
* Fix typing
* Use np.ndarray
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix typing
Signed-off-by: Yuanyuan Chen <[email protected]>
* Format code
Signed-off-by: Yuanyuan Chen <[email protected]>
* Use np.ndarray
Signed-off-by: Yuanyuan Chen <[email protected]>
* Improve typing
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix quote string of np.ndarray
Signed-off-by: Yuanyuan Chen <[email protected]>
* More fixes
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix code
* Format
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Remove unused arguments (#40916)
* Fix unused arguments
Signed-off-by: Yuanyuan Chen <[email protected]>
* More fixes
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Remove tf and flax from Chinese documentation (#41057)
Signed-off-by: Yuanyuan Chen <[email protected]>
* fix wrong height and width when read video use torchvision (#41091)
* docs: Fix Tool Use links and remove dead RAG links (#41104)
docs: Fix tool use links. Remove dead RAG links. Fix style
* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)
* tmp
* fix modular inheritance
* nit
* paligemma 1 doesn't have swa
* use same pattern as in models with hybrid layers
* PR comments
* helium also needs layer_typed (bc it relies on gemma)
* paligemma/gemma3: same mask creation fn in fwd and generate
* propagate changes to helium (gemma-based)
* tmp commit
* slow paligemma tests passing, let's see what breaks
* fix test_left_padding_compatibility
* tmp commit
* tmp commit
* rebase error
* docs
* reduce diff
* like this?
* t5gemma
* better comment
* shorter diff
* exception
* ffs type
* optional
* shorter modular_gemma.py
* helium model actually needs no changes -- the tester is the issue
* t5gemma modular config
* a few more modular; paligemma BC
* fix processor issues?
* rm config exception
* lift warning in gemma
* [tests] gpt2 + `CausalLMModelTester` (#41003)
* tmp commit
* tmp commit
* tmp commit
* rm old GPT2ModelTester
* nit bug
* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns
* vision_encoder_decoder
* Fix `_get_test_info` for inherited tests (#41106)
* fix _get_test_info
* fix patched
* add comment
* ruff
---------
Co-authored-by: ydshieh <[email protected]>
* Remove bad test skips (#41109)
* remove bad skips
* remove more
* fix inits
* Format empty lines and white space in markdown files. (#41100)
* Remove additional white space and empty lines from markdown files
Signed-off-by: Yuanyuan Chen <[email protected]>
* Add empty lines around code
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes
Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>
* 🚨 [V5] Remove deprecated training arguments (#41017)
* Remove deprecated training arguments from V5
Signed-off-by: Yuanyuan Chen <[email protected]>
* Remove deprecated training arguments from V5
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix comments
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix code
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
* Support loading LFM2 GGUF (#41111)
* add gguf config mapping for lfm2
* add lfm2 tensor process to unsqueeze conv weights
* adjust values from gguf config to HF config
* add test for lfm2 gguf
* ruff
---------
Co-authored-by: Marc Sun <[email protected]>
* [torchao safetensors] integrate torchao safetensors support with transformers (#40735)
* enable torchao safetensors
* enable torchao safetensors support
* add more version checking
* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)
* fix mismatched dims for qwen3 next
* propagate changes
* chore: renamed tot_heads to total_sequence_length
* Apply suggestion from @vasqu
Co-authored-by: Anton Vlasjuk <[email protected]>
* minor fix to modular qwen3 next file
---------
Co-authored-by: Anton Vlasjuk <[email protected]>
* Fix the error where a keyword argument appearing before *args (#41099)
Signed-off-by: Yuanyuan Chen <[email protected]>
* Fix broken `` expressions in markdown files (#41113)
Fix broken expressions in markdown files
Signed-off-by: Yuanyuan Chen <[email protected]>
* Remove self-assignment (#41062)
* Remove self-assignment
Signed-off-by: Yuanyuan Chen <[email protected]>
* Update src/transformers/integrations/flash_paged.py
Co-authored-by: Matt <[email protected]>
* Clear pass
Signed-off-by: Yuanyuan Chen <[email protected]>
* Clear pass
Signed-off-by: Yuanyuan Chen <[email protected]>
* Clear pass
Signed-off-by: Yuanyuan Chen <[email protected]>
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Matt <[email protected]>
* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)
* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning
* docs(text2text_generation): 更新参数注释以反映现代生成实践
将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法
* refactor(text2text_generation): Remove outdated input validation logic
* docs(text2text_generation): Revert incorrectly modified comment
* docs(text2text_generation): Revert incorrectly modified comment
* Fixed MXFP4 model storage issue (#41118)
* Fixed loading LongT5 from legacy checkpoints (#40724)
* Fixed loading LongT5 from legacy checkpoints
* Adapted the fix to work with missing lm_head
* dummy commit (#41133)
* dummy commit, nothing interesting
* dummy commit, nothing interesting
* dummy commit, nothing interesting
* dummy commit, nothing interesting
---------
Co-authored-by: ydshieh <[email protected]>
* Fix loading logic flaw with regards to unexpected and missing keys (#40850)
* Unexpected keys should be ignored at load with device map
* remove them all
* fix logic flaw
* fix
* simplify
* style
* fix
* revert caching allocator change
* add other test
* add nice doc
---------
Co-authored-by: Cyril Vallez <[email protected]>
* Using torch.distributions.Categorical
* Resolving logits_process.py Issues
* style: autoformat with make fixup
* Update logits_process.py removed defaults
* Variable H name -> cumulative_entropy
* Resolving format error
* Correction of the loop variables in logit processor
* Vectorized the loop in logits_process
* formatted logits_process
* paper reference and stopping rule comment logits_process
* Trigger CI rerun
* Update logits_process.py
* added test_TopH_example_integration
* added test_TopH_example_integration
* Update README.md
* Restore CI config to match main (remove accidental changes)
* Restore CI config to match upstream main (no diffs)
---------
Signed-off-by: Yuanyuan Chen <[email protected]>
Signed-off-by: greg-kwasniewski1 <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Wang, Yi <[email protected]>
Co-authored-by: ArminAzizi98 <[email protected]>
Co-authored-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Mohamed Mekkouri <[email protected]>
Co-authored-by: Yuchao Zhang <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>
Co-authored-by: Pavel Iakubovskii <[email protected]>
Co-authored-by: Bo Zheng <[email protected]>
Co-authored-by: bozheng-hit <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Rémi Ouazan <[email protected]>
Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: Ryan Mullins <[email protected]>
Co-authored-by: Amer <[email protected]>
Co-authored-by: eustlb <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Ákos Hadnagy <[email protected]>
Co-authored-by: Grzegorz Kwasniewski <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: 艾力可 <[email protected]>
Co-authored-by: JJJYmmm <[email protected]>
Co-authored-by: Manuel de Prada Corral <[email protected]>
Co-authored-by: Samuel Barry <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
Co-authored-by: HyunZ118 <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>
Co-authored-by: ydshieh <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Shane A <[email protected]>
Co-authored-by: Xuehai Pan <[email protected]>
Co-authored-by: Matt <[email protected]>
Co-authored-by: Raushan Turganbay <[email protected]>
Co-authored-by: Aritra Roy Gosthipaty <[email protected]>
Co-authored-by: vb <[email protected]>
Co-authored-by: Yaswanth Gali <[email protected]>
Co-authored-by: Akshay Babbar <[email protected]>
Co-authored-by: liangel-02 <[email protected]>
Co-authored-by: Duc-Viet Hoang <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <[email protected]>
Co-authored-by: lilin-1 <[email protected]>
Co-authored-by: Matej Sirovatka <[email protected]>
Co-authored-by: Jack <[email protected]>
Co-authored-by: Rangehow <[email protected]>
Co-authored-by: rangehow <[email protected]>
Co-authored-by: Anna <[email protected]>
Co-authored-by: Anna Banaszak <[email protected]>
Co-authored-by: Hamish Scott <[email protected]>
Co-authored-by: Harshal Janjani <[email protected]>
Co-authored-by: Branden <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Benjamin Bossan <[email protected]>
Co-authored-by: Ita Zaporozhets <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: StevenBucaille <[email protected]>
Co-authored-by: BakerBunker <[email protected]>
Co-authored-by: lvyuanjun.lyj <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Ayush <[email protected]>
Co-authored-by: Ryan Mullins <[email protected]>
Co-authored-by: Yannick Schnider <[email protected]>
Co-authored-by: Ralph Gleaton <[email protected]>
Co-authored-by: Saidur Rahman Pulok <[email protected]>
Co-authored-by: Nick Doiron <[email protected]>
Co-authored-by: Wang, Yi <[email protected]>
Co-authored-by: Duygu Altinok <[email protected]>
Co-authored-by: Jinde.Song <[email protected]>
Co-authored-by: hbenoit <[email protected]>
Co-authored-by: nnul <[email protected]>
Co-authored-by: YangKai0616 <[email protected]>
Co-authored-by: Karol Szustakowski <[email protected]>
Co-authored-by: souvikku <[email protected]>1 parent e064dc0 commit 82ffeb2
File tree
8 files changed
+243
-0
lines changed- docs/source/en/internal
- src/transformers
- generation
- tests/generation
8 files changed
+243
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
156 | 159 | | |
157 | 160 | | |
158 | 161 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
422 | 422 | | |
423 | 423 | | |
424 | 424 | | |
| 425 | + | |
425 | 426 | | |
426 | 427 | | |
427 | 428 | | |
| |||
586 | 587 | | |
587 | 588 | | |
588 | 589 | | |
| 590 | + | |
589 | 591 | | |
590 | 592 | | |
591 | 593 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
| |||
153 | 154 | | |
154 | 155 | | |
155 | 156 | | |
| 157 | + | |
156 | 158 | | |
157 | 159 | | |
158 | 160 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
168 | 174 | | |
169 | 175 | | |
170 | 176 | | |
| |||
354 | 360 | | |
355 | 361 | | |
356 | 362 | | |
| 363 | + | |
357 | 364 | | |
358 | 365 | | |
359 | 366 | | |
| |||
578 | 585 | | |
579 | 586 | | |
580 | 587 | | |
| 588 | + | |
| 589 | + | |
581 | 590 | | |
582 | 591 | | |
583 | 592 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
581 | 581 | | |
582 | 582 | | |
583 | 583 | | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
584 | 690 | | |
585 | 691 | | |
586 | 692 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| 96 | + | |
96 | 97 | | |
97 | 98 | | |
98 | 99 | | |
| |||
1243 | 1244 | | |
1244 | 1245 | | |
1245 | 1246 | | |
| 1247 | + | |
| 1248 | + | |
1246 | 1249 | | |
1247 | 1250 | | |
1248 | 1251 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
394 | 395 | | |
395 | 396 | | |
396 | 397 | | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
397 | 487 | | |
398 | 488 | | |
399 | 489 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3030 | 3030 | | |
3031 | 3031 | | |
3032 | 3032 | | |
| 3033 | + | |
| 3034 | + | |
| 3035 | + | |
| 3036 | + | |
| 3037 | + | |
| 3038 | + | |
| 3039 | + | |
| 3040 | + | |
| 3041 | + | |
| 3042 | + | |
| 3043 | + | |
| 3044 | + | |
| 3045 | + | |
| 3046 | + | |
| 3047 | + | |
| 3048 | + | |
| 3049 | + | |
| 3050 | + | |
| 3051 | + | |
| 3052 | + | |
| 3053 | + | |
| 3054 | + | |
| 3055 | + | |
| 3056 | + | |
| 3057 | + | |
| 3058 | + | |
| 3059 | + | |
| 3060 | + | |
3033 | 3061 | | |
3034 | 3062 | | |
3035 | 3063 | | |
| |||
0 commit comments