Update qMoE spec to support block quantization #25641

tianleiwu · 2025-08-03T20:36:28Z

Update operator spec to support block quantization in qMoE.
Implementation will come later.

Update operator spec to support block quantization in qMoE. Implementation will come later.

Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Yateng Hong <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: quic-calvnguy <[email protected]> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <[email protected]> Co-authored-by: yifei <[email protected]>

apsonawane · 2025-10-21T23:42:50Z

Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag

update qMoE to support block_size

432ed68

tianleiwu force-pushed the tlwu/block_wise_qmoe branch from 8b9e59b to 432ed68 Compare August 3, 2025 20:48

tianleiwu added 3 commits August 3, 2025 14:27

fix ,

dec42f6

format

f209bb5

update doc

b532a76

tianleiwu requested a review from kunal-vaishnavi August 4, 2025 03:18

kunal-vaishnavi approved these changes Aug 4, 2025

View reviewed changes

tianleiwu merged commit 59871e3 into main Aug 4, 2025
92 checks passed

tianleiwu deleted the tlwu/block_wise_qmoe branch August 4, 2025 15:58

sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025

Update qMoE spec to support block quantization (microsoft#25641)

fb2659a

Update operator spec to support block quantization in qMoE. Implementation will come later.

kunal-vaishnavi mentioned this pull request Aug 14, 2025

Add OpenAI's gpt-oss to ONNX Runtime GenAI microsoft/onnxruntime-genai#1678

Merged

gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request Sep 2, 2025

Update qMoE spec to support block quantization (microsoft#25641)

9b23fe0

Update operator spec to support block quantization in qMoE. Implementation will come later.

kunal-vaishnavi added the release:1.23.2 label Oct 9, 2025

apsonawane pushed a commit that referenced this pull request Oct 17, 2025

Update qMoE spec to support block quantization (#25641)

7a8ffa8

Update operator spec to support block quantization in qMoE. Implementation will come later.

apsonawane mentioned this pull request Oct 17, 2025

ORT 1.23.2 cherrypick 1 #26347

Closed

apsonawane pushed a commit that referenced this pull request Oct 20, 2025

Update qMoE spec to support block quantization (#25641)

de822ac

Update operator spec to support block quantization in qMoE. Implementation will come later.

apsonawane mentioned this pull request Oct 20, 2025

ORT 1.23.2 cherrypick 1 #26368

Merged

apsonawane added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.2 labels Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update qMoE spec to support block quantization #25641

Update qMoE spec to support block quantization #25641

Uh oh!

tianleiwu commented Aug 3, 2025

Uh oh!

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update qMoE spec to support block quantization #25641

Update qMoE spec to support block quantization #25641

Uh oh!

Conversation

tianleiwu commented Aug 3, 2025

Uh oh!

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants