Backmerging with msft commits #627

jatinwadhwa921 · 2025-03-25T16:34:29Z

Backmerging with msft commits

…24026) ### Description - Add the new run option called lora_config to feed the information from lora binary - Parse and apply the lora binary in OnRunStart ### Motivation and Context - Support Lora Adapter Binary with QNN Context Binary Usage

### Description  * Update to trt10.9 * oss parser tested (here's testing method https://onnxruntime.ai/docs/build/eps.html#note-to-ort-1210-open-sourced-parser-users) ### Motivation and Context

This pr applies DP4A to generation shader. And also support any block_size % 32 = 0.

### Description Add slide window support in cutlass fused attention ### Motivation and Context The change was previously created by Ye: microsoft#21926 I merged the change and resolved some conflictions. Also reversed some Ye's change in kernel_forward.h, so that our code is consistent with pytorch code.

…icrosoft#24103) ### Description Rename class HIPPinnedAllocator to MIGraphXPinnedAllocator ### Motivation and Context To align allocators' naming for the MIGraphX EP

…t#24104) ### Description For a newer CMake, suppress warnings about incorrect letter cases in package names. ### Motivation and Context To avoid reporting for newer CMake that a package name contains capital letters when small letters are required.

fix for microsoft#24070

…rosoft#23852) Description To honor SessionOption API Contract the ordering of AddConfigOption and AppendExecutionProvider_OpenVINO should not matter. This PR is fixing that issue Motivation and Context This PR fixes a regression happened during last PR in ordering of SessionOptions.

This adds Max and Average pool operators for webgpu-native. Basically, this is a rewrite of the corresponding JSEP operators with some improvements: 1) 'dilations' support 2) Pooling with kernelShape.length > 2 for NHWC format 3) code cleanup However, there are still a few missing features: 1) ceil 'ceil_mode' 2) column major 'storage_order' 3) 'Indices' output for Max pools.

…ft#24122) ### Description put `GetMaxComponents` and `SumVector` to one place. fix a bug in `SumVector`: ```diff - return "(" + x + ".x + " + x + ".y + " + x + ".w + " + x + ".z" + ")"; + return "(" + x + ".x + " + x + ".y + " + x + ".z + " + x + ".w" + ")"; ```

### Description It is not common that dev machine have MPI installed. Skip the test if MPI is not installed. ### Motivation and Context Make it easy to run pytest in dev machine without the need to skip the test manually.

### Description This PR integrates Arm® KleidiAI™ to provide optimized assembly kernels for matrix multiplication with 4-bit quantized weights. These changes target the MlasQNBitGemm functions, and can be utilized via the MatMulNBits operator.

### Description This PR enables web tests (NPM suite tests) for WebGPU EP. There are some test failures expected, so the specific job is marked as "continueOnError". ### Motivation and Context

…t#24118) ### Description  This PR continues the work started at microsoft#19401. ### Motivation and Context An overridable initializer should not have a fixed value included in an WebNN model as it could be changed at runtime. The current check doesn't include validating that the initializer is constant.

### Description Deleted the constant SKIP_CUDA_TEST_WITH_DML. It does not seem to be used anywhere. ### Motivation and Context The constant SKIP_CUDA_TEST_WITH_DML prohibits onnxruntime to be compiled when both of the flags -use_cuda and -use_dml are set. Co-authored-by: Andreas Hussing <[email protected]>

Previously, the encoder onnx model adds extra initialization for decoder to generate kv cache from prompt. It is not necessary. Here we redesign onnx export for T5 model to output two separate models for encode and decoder. Move Linear that generates cross features based on encoder_hidden_states to encoder onnx model. In this way, the encoder does not need output encoder_hidden_states, and only need output the features for cross attention used in decoder. Major changes: -[x] update t5 onnx export script -[x] update convert_generation script -[x] update beam search to support changes of inputs and outputs (detail can be found below). -[x] add a tiny t5 model, and enable the generation test for T5 in Linux CI pipelines. Example change in inputs and outputs for one layer model: **Encoder Inputs**: - encoder_input_ids: int32 (B, encode_sequence_length) - encoder_attention_mask: int32 (B, encode_sequence_length) - ~~decoder_input_ids: int32 (B, 1)~~ **Encoder Outputs**: - ~~logits: (B, 1, vocab_size)~~ - ~~encoder_hidden_states: (B, encode_sequence_length, encoder_hidden_size)~~ - ~~present_key_self_0: (B, num_heads, 1, head_size)~~ - ~~present_value_self_0: (B, num_heads, 1, head_size)~~ - present_key_cross_0: (B, num_heads, encode_sequence_length, head_size) - present_value_cross_0: (B, num_heads, encode_sequence_length, head_size) **Decoder Inputs**: - input_ids: int32 (B, 1) - ~~encoder_input_ids: int32 (B, encode_sequence_length) (optional for old format; removed in new format)~~ - encoder_attention_mask: int32 (B, encode_sequence_length) - ~~encoder_hidden_states: (B, encode_sequence_length, encoder_hidden_size) (optional for old format; removed in new format)~~ - past_key_self_0: (B, num_heads, past_decode_sequence_length, head_size) - past_value_self_0: (B, num_heads, past_decode_sequence_length, head_size) - past_key_cross_0: (B, num_heads, encode_sequence_length, head_size) - past_value_cross_0: (B, num_heads, encode_sequence_length, head_size) **Decoder Outputs**: - logits: (B, 1, vocab_size) - present_key_self_0: (B, num_heads, past_decode_sequence_length + 1, head_size) - present_value_self_0: (B, num_heads, past_decode_sequence_length + 1, head_size) Known issues: - Some postprocessing (like converting to use decoder masked MHA, past and present buffer sharing) is not done. Could be a future work item to integrate with onnxruntime-genai. ### Motivation and Context Make the encoder onnx model simpler and more efficient in inference (no need to output encoder_hidden_states).

### Description Adding back missing dist folder ### Motivation and Context

### Description This PR moves the CUDA memcpy for the QK output when type `T` is equal to type `QK` from `attention_impl.cu` into `attention_qk.cu`. ### Motivation and Context This PR fixes a linkage error when type `T` and type `QK` are the same in `attention_qk.cu`.

@fs-eire

…/nextjs-default (microsoft#24132) Bumps [next](https://github.com/vercel/next.js) from 15.1.2 to 15.2.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v15.2.3</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Update default allowed origins list (<a href="https://redirect.github.com/vercel/next.js/issues/77212">#77212</a>)</li> <li>unify allowed origin detection handling (<a href="https://redirect.github.com/vercel/next.js/issues/77053">#77053</a>)</li> <li>Add dev warning for cross-origin and stabilize allowedDevOrigins (<a href="https://redirect.github.com/vercel/next.js/issues/77044">#77044</a>)</li> <li>Ensure deploymentId is used for CSS preloads (<a href="https://redirect.github.com/vercel/next.js/issues/77210">#77210</a>)</li> <li>Update middleware request header (<a href="https://redirect.github.com/vercel/next.js/issues/77201">#77201</a>)</li> <li>[metadata] remove the default segement check for metadata rendering (<a href="https://redirect.github.com/vercel/next.js/issues/77119">#77119</a>)</li> <li>[ts-hint] fix vscode type hint plugin enabling (<a href="https://redirect.github.com/vercel/next.js/issues/77099">#77099</a>)</li> <li>[metadata] re-insert icons to head for streamed metadata (<a href="https://redirect.github.com/vercel/next.js/issues/76915">#76915</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/ijjk"><code>@ijjk</code></a>, <a href="https://github.com/ztanner"><code>@ztanner</code></a>, and <a href="https://github.com/huozhi"><code>@huozhi</code></a> for helping!</p> <h2>v15.2.2</h2> <h3>Core Changes</h3> <ul> <li>[dev-overlay] fix styling on overflow error messages, add button hover state: <a href="https://redirect.github.com/vercel/next.js/issues/76771">#76771</a></li> <li>Fix: respond 405 status code on OPTIONS request to SSG page: <a href="https://redirect.github.com/vercel/next.js/issues/76767">#76767</a></li> <li>[dev-overlay] Always show relative paths: <a href="https://redirect.github.com/vercel/next.js/issues/76742">#76742</a></li> <li>[metadata] remove the duplicate metadata in the error boundary: <a href="https://redirect.github.com/vercel/next.js/issues/76791">#76791</a></li> <li>Upgrade React from <code>d55cc79b-20250228</code> to <code>443b7ff2-20250303</code>: <a href="https://redirect.github.com/vercel/next.js/issues/76804">#76804</a></li> <li>[dev-overlay] Ignore animations on page load: <a href="https://redirect.github.com/vercel/next.js/issues/76834">#76834</a></li> <li>fix: remove useless set-cookie in action-handler: <a href="https://redirect.github.com/vercel/next.js/issues/76839">#76839</a></li> <li>Turbopack: handle task cancelation: <a href="https://redirect.github.com/vercel/next.js/issues/76831">#76831</a></li> <li>Upgrade React from <code>443b7ff2-20250303</code> to <code>e03ac20f-20250305</code>: <a href="https://redirect.github.com/vercel/next.js/issues/76842">#76842</a></li> <li>add types for <code>__next_app__</code> module loading functions: <a href="https://redirect.github.com/vercel/next.js/issues/74566">#74566</a></li> <li>fix duplicated noindex when server action is triggered: <a href="https://redirect.github.com/vercel/next.js/issues/76847">#76847</a></li> <li>fix: don't drop queued actions when navigating: <a href="https://redirect.github.com/vercel/next.js/issues/75362">#75362</a></li> <li>[dev-overlay]: remove dependency on platform for focus trapping: <a href="https://redirect.github.com/vercel/next.js/issues/76849">#76849</a></li> <li>Turbopack: Add <strong>turbopack_load_by_url</strong>: <a href="https://redirect.github.com/vercel/next.js/issues/76814">#76814</a></li> <li>Add handling of origin in dev mode: <a href="https://redirect.github.com/vercel/next.js/issues/76880">#76880</a></li> <li>[dev-overlay] Stop grouping callstack frames into ignored vs. not ignored: <a href="https://redirect.github.com/vercel/next.js/issues/76861">#76861</a></li> <li>Upgrade React from <code>e03ac20f-20250305</code> to <code>029e8bd6-20250306</code>: <a href="https://redirect.github.com/vercel/next.js/issues/76870">#76870</a></li> <li>[dev-overlay] Increase padding if no <code>x</code> button present: <a href="https://redirect.github.com/vercel/next.js/issues/76898">#76898</a></li> <li>fix: prevent incorrect searchParams being applied on certain navs: <a href="https://redirect.github.com/vercel/next.js/issues/76914">#76914</a></li> <li>[dev-overlay] Dim ignore-listed callstack frames when shown: <a href="https://redirect.github.com/vercel/next.js/issues/76862">#76862</a></li> </ul> <h3>Example Changes</h3> <ul> <li>chore(cna): update tailwind styles to be closer to non-tw cna: <a href="https://redirect.github.com/vercel/next.js/issues/76647">#76647</a></li> </ul> <h3>Misc Changes</h3>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vercel/next.js/commit/535e26d3c69de49df8bd17618a424cbe65ec897b"><code>535e26d</code></a> v15.2.3</li> <li><a href="https://github.com/vercel/next.js/commit/2fcae1d7e3079874ff633b5b8311adb584c80ce6"><code>2fcae1d</code></a> Update default allowed origins list (<a href="https://redirect.github.com/vercel/next.js/issues/77212">#77212</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/adf5462b5f269963395b0a2ef12a1b66e8cadabc"><code>adf5462</code></a> unify allowed origin detection handling (<a href="https://redirect.github.com/vercel/next.js/issues/77053">#77053</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/5e59da1f5c8b9e8b3a759048bd371efcd77813ae"><code>5e59da1</code></a> Add dev warning for cross-origin and stabilize allowedDevOrigins (<a href="https://redirect.github.com/vercel/next.js/issues/77044">#77044</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/8151cb6ce921cb1b9faeab6fb88551146dc206b7"><code>8151cb6</code></a> Ensure deploymentId is used for CSS preloads (<a href="https://redirect.github.com/vercel/next.js/issues/77210">#77210</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/52a078da3884efe6501613c7834a3d02a91676d2"><code>52a078d</code></a> Update middleware request header (<a href="https://redirect.github.com/vercel/next.js/issues/77201">#77201</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/4698ad6478cc85a7283a8c41edfbba023dadf57d"><code>4698ad6</code></a> [metadata] remove the default segement check for metadata rendering (<a href="https://redirect.github.com/vercel/next.js/issues/77119">#77119</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/1e1ff403a28703b08e68758cfcbb7b6c97c4bd2a"><code>1e1ff40</code></a> [ts-hint] fix vscode type hint plugin enabling (<a href="https://redirect.github.com/vercel/next.js/issues/77099">#77099</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/88deb12b03c90f5146b1270cd7bea3517cf90083"><code>88deb12</code></a> [metadata] re-insert icons to head for streamed metadata (<a href="https://redirect.github.com/vercel/next.js/issues/76915">#76915</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/f4552826e1ed15fbeb951be552d67c5a08ad0672"><code>f455282</code></a> v15.2.2</li> <li>Additional commits viewable in <a href="https://github.com/vercel/next.js/compare/v15.1.2...v15.2.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.1.2&new-version=15.2.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ps (microsoft#24090) ### Description  Support shape inference for QLinearAdd and QLinearMul ops which were missing in symbolic_shape_infer.py ### Motivation and Context  This change is required to enable shape inference for models with "QLinearAdd" ops which are defined in com.microsoft domain and the shapes of which cannot be inferred using onnx shape_inference alone. Fixes issue microsoft#24028 --------- Signed-off-by: Praveen G <[email protected]>

…ine (microsoft#23580) ### Description Follow-up to microsoft#23551 Adds the BrowserStack testing stage for Android to the NuGet packaging pipeline. This test tests that the NuGet package produced will be imported and work correctly on an Android device [Pipeline run that shows what a failing unit test would look like](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=670961&view=results) --------- Co-authored-by: Edward Chen <[email protected]>

### Description Add fp16 support to sparse attention ### Motivation and Context Generalize models for CPU and GPU

### Description This PR refactors the mac CI pipeline: - Use composite action and reusable workflow to put together duplicated code - separate each EP

### Description Create a separate template overloads to address Windows Debug build warning 'unreachable code'.

@qjia7

…oft#24115) ### Description This PR introduced a new WebGPU EP option `preserveDevice`. Before this change, a WebGPU device will be destroyed when no inference session uses it. The destroy of a WebGPU device will cleanup both buffer cache and shader cache. After this option is introduced, when the option is ON (default value is OFF), the device will no longer be destroyed and will be always keep alive. This is helpful in 2 scenarios: - A server that will be always on - unittest so that bugs of incorrect shader cache may be detected. (thanks to @qjia7 for the suggestion)

…oft#24014) ### Description  This gives a way for webapp developers to customize the bundler behavior regarding whether to bundle the wasm. To avoid treating ort-wasm-threaded-simd.jsep.mjs and ort-wasm-threaded-simd.jsep.wasm as dependencies during the process of bundler build, use import condition `onnxruntime-web-use-extern-wasm`. For webpack: ``` module.exports = { //... resolve: { conditionNames: ['onnxruntime-web-use-extern-wasm', 'import', 'module'], }, }; ``` For esbuild: ``` await esbuild.build({ //... conditions: ['onnxruntime-web-use-extern-wasm', 'import', 'module'], }) ``` For rollup: ``` import { nodeResolve } from '@rollup/plugin-node-resolve'; export default { //... plugins: [nodeResolve({ exportConditions: ['onnxruntime-web-use-extern-wasm', 'import', 'module', 'development|production'] })] }; ``` ### Motivation and Context  - microsoft#24009

…oft#23937) ### Description Add API for accessing metadata of a model's input/output. Currently, The implementation is only applied to web assembly backend and nodejs binding. For webgl, there is so far no plan to implement this API; for react-native, the implementation will be done later and is not included in this PR. #### Example usage: ```js const mySession = await ort.InferenceSession.create( ... ); console.log(`there are ${mySession.inputMetadata.length} inputs:`); for (let i = 0; i < mySession.inputMetadata.length; i++) { let info; if (mySession.inputMetadata[i].isTensor) { info = `tensor: ${mySession.inputMetadata[i].type}, shape: ${mySession.inputMetadata[i].shape}`; } else { info = `non-tensor`; } console.log(`input ${i}: ${mySession.inputMetadata[i].name}: ${info}`); } ``` possible output: ``` there are 1 inputs: input 0: input: tensor: float32, shape: [batch, 3, 224, 224] ``` Resolves: - microsoft#22682 - microsoft#22949

### Description add cache "onnxnodetests" for node tests This fixes the random download network error for onnx node tests data. ### Motivation and Context

### Description Add Native Matmul (`MatMulNaive`, `MatMulPacked` and `MatMulPackedVec4` ) ### Motivation and Context

### Description Big model pipeline are still using cuda 11.8. This update the pipeline to use cuda 12.x. ### Motivation and Context

…icrosoft#24151) ### Description Show proper error message when fp16 model is used for Beam Search in CPU. Before: ``` 2025-02-15 20:15:02.999160115 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: bad_function_call ``` After: ``` onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: onnxruntime/onnxruntime/contrib_ops/cpu/transformers/beam_search.cc:309 virtual onnxruntime::common::Status onnxruntime::contrib::transformers::BeamSearch::Compute(onnxruntime::OpKernelContext*) const BeamSearch does not support float16 model on CPU execution provider. Use float32 model or CUDA execution provider instead. ``` ### Motivation and Context microsoft#23728

### Description As titled. ### Motivation and Context We have the last MatMul in phi-4-mini onnx which is b_shape = {3072, 200064} packed_b_size = MlasGemmPackBSize(N, K); it is `3072*200064*sizeof(float)=2458386432` This is larger than 2,147,483,647, it is out of the int boundary on a 32-bit system. Then len is negative. So we change the type to size_t, and the model can be loaded successfully after the change.

chuteng-quic and others added 30 commits March 20, 2025 09:26

[webgpu] Apply dp4a for generation shader (microsoft#24064)

127c850

This pr applies DP4A to generation shader. And also support any block_size % 32 = 0.

[MIGraphX EP] rename HIPPinnedAllocator to MIGraphXPinnedAllocator (m…

16b0b32

…icrosoft#24103) ### Description Rename class HIPPinnedAllocator to MIGraphXPinnedAllocator ### Motivation and Context To align allocators' naming for the MIGraphX EP

[JSEP] handles edge case in gridsample operator (microsoft#24121)

469fb7e

fix for microsoft#24070

skip MOE python test when MPI is not installed (microsoft#24116)

dcc1f5a

### Description It is not common that dev machine have MPI installed. Skip the test if MPI is not installed. ### Motivation and Context Make it easy to run pytest in dev machine without the need to skip the test manually.

Update package.json to make the dist avaliable again (microsoft#23991)

3012d44

### Description Adding back missing dist folder ### Motivation and Context

[CPU] Add fp16 support to sparse attention (microsoft#24015)

828e372

### Description Add fp16 support to sparse attention ### Motivation and Context Generalize models for CPU and GPU

refactor mac CI pipelines (microsoft#24138)

373b9e2

### Description This PR refactors the mac CI pipeline: - Use composite action and reusable workflow to put together duplicated code - separate each EP

Address Windows CUDA build issue (microsoft#24149)

5244d68

### Description Create a separate template overloads to address Windows Debug build warning 'unreachable code'.

tianleiwu and others added 3 commits March 25, 2025 07:44

Merge branch 'master' into sync_msft_25_3_25

c7ac5c8

jatinwadhwa921 requested a review from ankitm3k March 25, 2025 16:34

ankitm3k approved these changes Mar 25, 2025

View reviewed changes

jatinwadhwa921 merged commit 2c61a3a into ovep-develop Mar 26, 2025
6 of 11 checks passed

jatinwadhwa921 deleted the sync_msft_25_3_25 branch April 15, 2025 05:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backmerging with msft commits #627

Backmerging with msft commits #627

Uh oh!

jatinwadhwa921 commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Backmerging with msft commits #627

Backmerging with msft commits #627

Uh oh!

Conversation

jatinwadhwa921 commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants