Update(deps): Bump transformers from 4.55.2 to 4.57.3 (#2658)

dependabot[bot] · meta-codesync[bot] · commit 49758eaec6f4 · 2025-12-01T16:36:52.000-08:00
Summary: Bumps [transformers](https://github.com/huggingface/transformers) from 4.55.2 to 4.57.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>Patch release v4.57.3</h2> <p>There was a hidden bug when loading models with <code>local_files_only=True</code> and a typo related to the recent patch.</p> <p>The main fix is: <a href="https://github.com/huggingface/transformers/commit/b6055550a15a8fab367cf983b743ff68cc58d81a">https://github.com/huggingface/transformers/commit/b6055550a15a8fab367cf983b743ff68cc58d81a</a>.</p> <p>We are really sorry that this slipped through, our CIs just did not catch it.</p> <p>As it affects a lot of users we are gonna yank the previous release</p> <h2>Patch Release v4.57.2</h2> <p>This patch most notably fixes an issue on some Mistral tokenizers. It contains the following commits:</p> <ul> <li>Add AutoTokenizer mapping for mistral3 and ministral (<a href="https://redirect.github.com/huggingface/transformers/issues/42198">#42198</a>)</li> <li>Auto convert tekken.json (<a href="https://redirect.github.com/huggingface/transformers/issues/42299">#42299</a>)</li> <li>fix tekken pattern matching (<a href="https://redirect.github.com/huggingface/transformers/issues/42363">#42363</a>)</li> <li>Check model inputs - hidden states (<a href="https://redirect.github.com/huggingface/transformers/issues/40994">#40994</a>)</li> <li>Remove invalid <code>staticmethod</code> from module-level get_device_and_memory_breakdown (<a href="https://redirect.github.com/huggingface/transformers/issues/41747">#41747</a>)</li> </ul> <h2>Patch release v4.57.1</h2> <p>This patch most notably fixes an issue with an optional dependency (<code>optax</code>), which resulted in parsing errors with <code>poetry</code>. It contains the following fixes:</p> <ul> <li><a href="https://github.com/huggingface/transformers/commit/0645c9ec3188e000aecf5060e2cdabcc156bb794">fix optax dep issue</a></li> <li><a href="https://github.com/huggingface/transformers/commit/a92b1e8a45e1863b95c5e2caa12f5597aee80279">remove offload_state_dict from kwargs</a></li> <li>Fix bnb fsdp loading for pre-quantized checkpoint (<a href="https://redirect.github.com/huggingface/transformers/issues/41415">#41415</a>)</li> <li>Fix tests fsdp (<a href="https://redirect.github.com/huggingface/transformers/issues/41422">#41422</a>)</li> <li>Fix trainer for py3.9 (<a href="https://redirect.github.com/huggingface/transformers/issues/41359">#41359</a>)</li> </ul> <h2>v4.57.0: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL, BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3</h2> <h2>New model additions</h2> <h3>Qwen3 Next</h3> <p>The Qwen3-Next series represents the Qwen team's next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency. The series introduces a suite of architectural innovations designed to maximize performance while minimizing computational cost:</p> <ul> <li><strong>Hybrid Attention</strong>: Replaces standard attention with the combination of <strong>Gated DeltaNet</strong> and <strong>Gated Attention</strong>, enabling efficient context modeling.</li> <li><strong>High-Sparsity MoE</strong>: Achieves an extreme low activation ratio as 1:50 in MoE layers — drastically reducing FLOPs per token while preserving model capacity.</li> <li><strong>Multi-Token Prediction(MTP)</strong>: Boosts pretraining model performance, and accelerates inference.</li> <li><strong>Other Optimizations</strong>: Includes techniques such as <strong>zero-centered and weight-decayed layernorm</strong>, <strong>Gated Attention</strong>, and other stabilizing enhancements for robust training.</li> </ul> <p>Built on this architecture, they trained and open-sourced Qwen3-Next-80B-A3B — 80B total parameters, only 3B active — achieving extreme sparsity and efficiency.</p> <p>Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks — while requiring <strong>less than 1/10 of the training cost</strong>. Moreover, it delivers over <strong>10x higher inference throughput</strong> than Qwen3-32B when handling contexts longer than 32K tokens.</p> <p>For more details, please visit their blog <a href="https://github.com/huggingface/transformers/blob/HEAD/qwen3_next">Qwen3-Next</a> (<a href="https://qwenlm.github.io/blog/qwen3_next/">blog post</a>).</p> <ul> <li>Adding Support for Qwen3-Next by <a href="https://github.com/bozheng-hit"><code>@​bozheng-hit</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/40771">#40771</a></li> </ul> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/huggingface/transformers/commit/47b0e478f324b54f177ea7998a0791870fdd0324"><code>47b0e47</code></a> 4.57.3</li> <li><a href="https://github.com/huggingface/transformers/commit/d3ee5e8f146ef9fb7299a61a5e406b49bf6b460c"><code>d3ee5e8</code></a> [<code>Mistral Tokenizers</code>] Fix tokenizer detection (<a href="https://redirect.github.com/huggingface/transformers/issues/42389">#42389</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/2915fb36cf8a48cad730f444b6057d18a6176d59"><code>2915fb3</code></a> Release v4.57.2</li> <li><a href="https://github.com/huggingface/transformers/commit/2a59904a7fad16f421e4f0df75499a96004704b9"><code>2a59904</code></a> fix tekken pattern matching (<a href="https://redirect.github.com/huggingface/transformers/issues/42363">#42363</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/7e66db75af633ecf0703c70b05c54308f65b705d"><code>7e66db7</code></a> Auto convert tekken.json (<a href="https://redirect.github.com/huggingface/transformers/issues/42299">#42299</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/311807f30c78f921247a532bc4c6690d8d96686f"><code>311807f</code></a> Remove invalid <code>staticmethod</code> from module-level get_device_and_memory_breakd...</li> <li><a href="https://github.com/huggingface/transformers/commit/804038f17aacf65525ea12f28bbb722a1a2bbc47"><code>804038f</code></a> Add AutoTokenizer mapping for mistral3 and ministral (<a href="https://redirect.github.com/huggingface/transformers/issues/42198">#42198</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/ede92a8755e48da7ae1d1b7d976ad581aa5c8327"><code>ede92a8</code></a> Check model inputs - hidden states (<a href="https://redirect.github.com/huggingface/transformers/issues/40994">#40994</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/8cb5963cc22174954e7dca2c0a3320b7dc2f4edc"><code>8cb5963</code></a> Release: v4.57.1</li> <li><a href="https://github.com/huggingface/transformers/commit/c6ae19e0e3960e8be9d14b500195fc3a29f9a097"><code>c6ae19e</code></a> Fix trainer for py3.9 (<a href="https://redirect.github.com/huggingface/transformers/issues/41359">#41359</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.55.2...v4.57.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.55.2&new-version=4.57.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Pull Request resolved: #2658 Reviewed By: BoyuanFeng Differential Revision: D88104417 Pulled By: huydhn fbshipit-source-id: 76a9f8dcb3d9512953f8b4088ef8329a1a1d24b7
diff --git a/requirements.txt b/requirements.txt
@@ -10,7 +10,7 @@ pytest-benchmark
 requests
 tabulate
 timm==1.0.19
-transformers==4.55.2
+transformers==4.57.3
 MonkeyType
 psutil
 pyyaml