Commit 49758ea
Update(deps): Bump transformers from 4.55.2 to 4.57.3 (#2658)
Summary:
Bumps [transformers](https://github.com/huggingface/transformers) from 4.55.2 to 4.57.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p>
<blockquote>
<h2>Patch release v4.57.3</h2>
<p>There was a hidden bug when loading models with <code>local_files_only=True</code> and a typo related to the recent patch.</p>
<p>The main fix is: <a href="https://github.com/huggingface/transformers/commit/b6055550a15a8fab367cf983b743ff68cc58d81a">https://github.com/huggingface/transformers/commit/b6055550a15a8fab367cf983b743ff68cc58d81a</a>.</p>
<p>We are really sorry that this slipped through, our CIs just did not catch it.</p>
<p>As it affects a lot of users we are gonna yank the previous release</p>
<h2>Patch Release v4.57.2</h2>
<p>This patch most notably fixes an issue on some Mistral tokenizers. It contains the following commits:</p>
<ul>
<li>Add AutoTokenizer mapping for mistral3 and ministral (<a href="https://redirect.github.com/huggingface/transformers/issues/42198">#42198</a>)</li>
<li>Auto convert tekken.json (<a href="https://redirect.github.com/huggingface/transformers/issues/42299">#42299</a>)</li>
<li>fix tekken pattern matching (<a href="https://redirect.github.com/huggingface/transformers/issues/42363">#42363</a>)</li>
<li>Check model inputs - hidden states (<a href="https://redirect.github.com/huggingface/transformers/issues/40994">#40994</a>)</li>
<li>Remove invalid <code>staticmethod</code> from module-level get_device_and_memory_breakdown (<a href="https://redirect.github.com/huggingface/transformers/issues/41747">#41747</a>)</li>
</ul>
<h2>Patch release v4.57.1</h2>
<p>This patch most notably fixes an issue with an optional dependency (<code>optax</code>), which resulted in parsing errors with <code>poetry</code>. It contains the following fixes:</p>
<ul>
<li><a href="https://github.com/huggingface/transformers/commit/0645c9ec3188e000aecf5060e2cdabcc156bb794">fix optax dep issue</a></li>
<li><a href="https://github.com/huggingface/transformers/commit/a92b1e8a45e1863b95c5e2caa12f5597aee80279">remove offload_state_dict from kwargs</a></li>
<li>Fix bnb fsdp loading for pre-quantized checkpoint (<a href="https://redirect.github.com/huggingface/transformers/issues/41415">#41415</a>)</li>
<li>Fix tests fsdp (<a href="https://redirect.github.com/huggingface/transformers/issues/41422">#41422</a>)</li>
<li>Fix trainer for py3.9 (<a href="https://redirect.github.com/huggingface/transformers/issues/41359">#41359</a>)</li>
</ul>
<h2>v4.57.0: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL, BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3</h2>
<h2>New model additions</h2>
<h3>Qwen3 Next</h3>
<p>The Qwen3-Next series represents the Qwen team's next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency.
The series introduces a suite of architectural innovations designed to maximize performance while minimizing computational cost:</p>
<ul>
<li><strong>Hybrid Attention</strong>: Replaces standard attention with the combination of <strong>Gated DeltaNet</strong> and <strong>Gated Attention</strong>, enabling efficient context modeling.</li>
<li><strong>High-Sparsity MoE</strong>: Achieves an extreme low activation ratio as 1:50 in MoE layers — drastically reducing FLOPs per token while preserving model capacity.</li>
<li><strong>Multi-Token Prediction(MTP)</strong>: Boosts pretraining model performance, and accelerates inference.</li>
<li><strong>Other Optimizations</strong>: Includes techniques such as <strong>zero-centered and weight-decayed layernorm</strong>, <strong>Gated Attention</strong>, and other stabilizing enhancements for robust training.</li>
</ul>
<p>Built on this architecture, they trained and open-sourced Qwen3-Next-80B-A3B — 80B total parameters, only 3B active — achieving extreme sparsity and efficiency.</p>
<p>Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks — while requiring <strong>less than 1/10 of the training cost</strong>.
Moreover, it delivers over <strong>10x higher inference throughput</strong> than Qwen3-32B when handling contexts longer than 32K tokens.</p>
<p>For more details, please visit their blog <a href="https://github.com/huggingface/transformers/blob/HEAD/qwen3_next">Qwen3-Next</a> (<a href="https://qwenlm.github.io/blog/qwen3_next/">blog post</a>).</p>
<ul>
<li>Adding Support for Qwen3-Next by <a href="https://github.com/bozheng-hit"><code>@bozheng-hit</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/40771">#40771</a></li>
</ul>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/huggingface/transformers/commit/47b0e478f324b54f177ea7998a0791870fdd0324"><code>47b0e47</code></a> 4.57.3</li>
<li><a href="https://github.com/huggingface/transformers/commit/d3ee5e8f146ef9fb7299a61a5e406b49bf6b460c"><code>d3ee5e8</code></a> [<code>Mistral Tokenizers</code>] Fix tokenizer detection (<a href="https://redirect.github.com/huggingface/transformers/issues/42389">#42389</a>)</li>
<li><a href="https://github.com/huggingface/transformers/commit/2915fb36cf8a48cad730f444b6057d18a6176d59"><code>2915fb3</code></a> Release v4.57.2</li>
<li><a href="https://github.com/huggingface/transformers/commit/2a59904a7fad16f421e4f0df75499a96004704b9"><code>2a59904</code></a> fix tekken pattern matching (<a href="https://redirect.github.com/huggingface/transformers/issues/42363">#42363</a>)</li>
<li><a href="https://github.com/huggingface/transformers/commit/7e66db75af633ecf0703c70b05c54308f65b705d"><code>7e66db7</code></a> Auto convert tekken.json (<a href="https://redirect.github.com/huggingface/transformers/issues/42299">#42299</a>)</li>
<li><a href="https://github.com/huggingface/transformers/commit/311807f30c78f921247a532bc4c6690d8d96686f"><code>311807f</code></a> Remove invalid <code>staticmethod</code> from module-level get_device_and_memory_breakd...</li>
<li><a href="https://github.com/huggingface/transformers/commit/804038f17aacf65525ea12f28bbb722a1a2bbc47"><code>804038f</code></a> Add AutoTokenizer mapping for mistral3 and ministral (<a href="https://redirect.github.com/huggingface/transformers/issues/42198">#42198</a>)</li>
<li><a href="https://github.com/huggingface/transformers/commit/ede92a8755e48da7ae1d1b7d976ad581aa5c8327"><code>ede92a8</code></a> Check model inputs - hidden states (<a href="https://redirect.github.com/huggingface/transformers/issues/40994">#40994</a>)</li>
<li><a href="https://github.com/huggingface/transformers/commit/8cb5963cc22174954e7dca2c0a3320b7dc2f4edc"><code>8cb5963</code></a> Release: v4.57.1</li>
<li><a href="https://github.com/huggingface/transformers/commit/c6ae19e0e3960e8be9d14b500195fc3a29f9a097"><code>c6ae19e</code></a> Fix trainer for py3.9 (<a href="https://redirect.github.com/huggingface/transformers/issues/41359">#41359</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.55.2...v4.57.3">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Pull Request resolved: #2658
Reviewed By: BoyuanFeng
Differential Revision: D88104417
Pulled By: huydhn
fbshipit-source-id: 76a9f8dcb3d9512953f8b4088ef8329a1a1d24b71 parent 9e56263 commit 49758ea
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
0 commit comments