[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.3 #2658

dependabot · 2025-11-25T21:03:06Z

Bumps transformers from 4.55.2 to 4.57.3.

Release notes

Patch release v4.57.3

There was a hidden bug when loading models with local_files_only=True and a typo related to the recent patch.

The main fix is: huggingface/transformers@b605555.

We are really sorry that this slipped through, our CIs just did not catch it.

As it affects a lot of users we are gonna yank the previous release

Patch Release v4.57.2

This patch most notably fixes an issue on some Mistral tokenizers. It contains the following commits:

Add AutoTokenizer mapping for mistral3 and ministral (#42198)

Auto convert tekken.json (#42299)

fix tekken pattern matching (#42363)

Check model inputs - hidden states (#40994)

Remove invalid @staticmethod from module-level get_device_and_memory_breakdown (#41747)

Patch release v4.57.1

This patch most notably fixes an issue with an optional dependency (optax), which resulted in parsing errors with poetry. It contains the following fixes:

fix optax dep issue

remove offload_state_dict from kwargs

Fix bnb fsdp loading for pre-quantized checkpoint (#41415)

Fix tests fsdp (#41422)

Fix trainer for py3.9 (#41359)

v4.57.0: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL, BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3

New model additions

Qwen3 Next

The Qwen3-Next series represents the Qwen team's next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency. The series introduces a suite of architectural innovations designed to maximize performance while minimizing computational cost:

Hybrid Attention: Replaces standard attention with the combination of Gated DeltaNet and Gated Attention, enabling efficient context modeling.

High-Sparsity MoE: Achieves an extreme low activation ratio as 1:50 in MoE layers — drastically reducing FLOPs per token while preserving model capacity.

Multi-Token Prediction(MTP): Boosts pretraining model performance, and accelerates inference.

Other Optimizations: Includes techniques such as zero-centered and weight-decayed layernorm, Gated Attention, and other stabilizing enhancements for robust training.

Built on this architecture, they trained and open-sourced Qwen3-Next-80B-A3B — 80B total parameters, only 3B active — achieving extreme sparsity and efficiency.

Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks — while requiring less than 1/10 of the training cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32K tokens.

For more details, please visit their blog Qwen3-Next (blog post).

Adding Support for Qwen3-Next by @bozheng-hit in #40771

... (truncated)

Commits

47b0e47 4.57.3
d3ee5e8 [Mistral Tokenizers] Fix tokenizer detection (#42389)
2915fb3 Release v4.57.2
2a59904 fix tekken pattern matching (#42363)
7e66db7 Auto convert tekken.json (#42299)
311807f Remove invalid @staticmethod from module-level get_device_and_memory_breakd...
804038f Add AutoTokenizer mapping for mistral3 and ministral (#42198)
ede92a8 Check model inputs - hidden states (#40994)
8cb5963 Release: v4.57.1
c6ae19e Fix trainer for py3.9 (#41359)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [transformers](https://github.com/huggingface/transformers) from 4.55.2 to 4.57.3. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v4.55.2...v4.57.3) --- updated-dependencies: - dependency-name: transformers dependency-version: 4.57.3 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>

meta-codesync · 2025-12-01T22:56:46Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this in D88104417.

dependabot · 2025-12-02T00:40:25Z

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

meta-codesync · 2025-12-02T00:40:28Z

@huydhn merged this pull request in 49758ea.

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Nov 25, 2025

meta-cla bot added the cla signed label Nov 25, 2025

dependabot bot mentioned this pull request Nov 25, 2025

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.2 #2657

Closed

huydhn approved these changes Nov 26, 2025

View reviewed changes

meta-codesync bot closed this in 49758ea Dec 2, 2025

facebook-github-bot added the Merged label Dec 2, 2025

dependabot bot deleted the dependabot/pip/main/transformers-4.57.3 branch December 2, 2025 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.3 #2658

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.3 #2658

Uh oh!

dependabot bot commented on behalf of github Nov 25, 2025

Uh oh!

meta-codesync bot commented Dec 1, 2025

Uh oh!

dependabot bot commented on behalf of github Dec 2, 2025

Uh oh!

meta-codesync bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.3 #2658

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.3 #2658

Uh oh!

Conversation

dependabot bot commented on behalf of github Nov 25, 2025

Patch release v4.57.3

Patch Release v4.57.2

Patch release v4.57.1

v4.57.0: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL, BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3

New model additions

Qwen3 Next

Uh oh!

meta-codesync bot commented Dec 1, 2025

Uh oh!

dependabot bot commented on behalf of github Dec 2, 2025

Uh oh!

meta-codesync bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants