forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 56
Backmerging with msft commits #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…24026) ### Description - Add the new run option called lora_config to feed the information from lora binary - Parse and apply the lora binary in OnRunStart ### Motivation and Context - Support Lora Adapter Binary with QNN Context Binary Usage
### Description <!-- Describe your changes. --> * Update to trt10.9 * oss parser tested (here's testing method https://onnxruntime.ai/docs/build/eps.html#note-to-ort-1210-open-sourced-parser-users) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
This pr applies DP4A to generation shader. And also support any block_size % 32 = 0.
### Description Add slide window support in cutlass fused attention ### Motivation and Context The change was previously created by Ye: microsoft#21926 I merged the change and resolved some conflictions. Also reversed some Ye's change in kernel_forward.h, so that our code is consistent with pytorch code.
…icrosoft#24103) ### Description Rename class HIPPinnedAllocator to MIGraphXPinnedAllocator ### Motivation and Context To align allocators' naming for the MIGraphX EP
…t#24104) ### Description For a newer CMake, suppress warnings about incorrect letter cases in package names. ### Motivation and Context To avoid reporting for newer CMake that a package name contains capital letters when small letters are required.
…rosoft#23852) Description To honor SessionOption API Contract the ordering of AddConfigOption and AppendExecutionProvider_OpenVINO should not matter. This PR is fixing that issue Motivation and Context This PR fixes a regression happened during last PR in ordering of SessionOptions.
This adds Max and Average pool operators for webgpu-native. Basically, this is a rewrite of the corresponding JSEP operators with some improvements: 1) 'dilations' support 2) Pooling with kernelShape.length > 2 for NHWC format 3) code cleanup However, there are still a few missing features: 1) ceil 'ceil_mode' 2) column major 'storage_order' 3) 'Indices' output for Max pools.
…ft#24122) ### Description put `GetMaxComponents` and `SumVector` to one place. fix a bug in `SumVector`: ```diff - return "(" + x + ".x + " + x + ".y + " + x + ".w + " + x + ".z" + ")"; + return "(" + x + ".x + " + x + ".y + " + x + ".z + " + x + ".w" + ")"; ```
### Description It is not common that dev machine have MPI installed. Skip the test if MPI is not installed. ### Motivation and Context Make it easy to run pytest in dev machine without the need to skip the test manually.
### Description This PR integrates Arm® KleidiAI™ to provide optimized assembly kernels for matrix multiplication with 4-bit quantized weights. These changes target the MlasQNBitGemm functions, and can be utilized via the MatMulNBits operator.
### Description This PR enables web tests (NPM suite tests) for WebGPU EP. There are some test failures expected, so the specific job is marked as "continueOnError". ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…t#24118) ### Description <!-- Describe your changes. --> This PR continues the work started at microsoft#19401. ### Motivation and Context An overridable initializer should not have a fixed value included in an WebNN model as it could be changed at runtime. The current check doesn't include validating that the initializer is constant.
### Description Deleted the constant SKIP_CUDA_TEST_WITH_DML. It does not seem to be used anywhere. ### Motivation and Context The constant SKIP_CUDA_TEST_WITH_DML prohibits onnxruntime to be compiled when both of the flags -use_cuda and -use_dml are set. Co-authored-by: Andreas Hussing <[email protected]>
Previously, the encoder onnx model adds extra initialization for decoder to generate kv cache from prompt. It is not necessary. Here we redesign onnx export for T5 model to output two separate models for encode and decoder. Move Linear that generates cross features based on encoder_hidden_states to encoder onnx model. In this way, the encoder does not need output encoder_hidden_states, and only need output the features for cross attention used in decoder. Major changes: -[x] update t5 onnx export script -[x] update convert_generation script -[x] update beam search to support changes of inputs and outputs (detail can be found below). -[x] add a tiny t5 model, and enable the generation test for T5 in Linux CI pipelines. Example change in inputs and outputs for one layer model: **Encoder Inputs**: - encoder_input_ids: int32 (B, encode_sequence_length) - encoder_attention_mask: int32 (B, encode_sequence_length) - ~~decoder_input_ids: int32 (B, 1)~~ **Encoder Outputs**: - ~~logits: (B, 1, vocab_size)~~ - ~~encoder_hidden_states: (B, encode_sequence_length, encoder_hidden_size)~~ - ~~present_key_self_0: (B, num_heads, 1, head_size)~~ - ~~present_value_self_0: (B, num_heads, 1, head_size)~~ - present_key_cross_0: (B, num_heads, encode_sequence_length, head_size) - present_value_cross_0: (B, num_heads, encode_sequence_length, head_size) **Decoder Inputs**: - input_ids: int32 (B, 1) - ~~encoder_input_ids: int32 (B, encode_sequence_length) (optional for old format; removed in new format)~~ - encoder_attention_mask: int32 (B, encode_sequence_length) - ~~encoder_hidden_states: (B, encode_sequence_length, encoder_hidden_size) (optional for old format; removed in new format)~~ - past_key_self_0: (B, num_heads, past_decode_sequence_length, head_size) - past_value_self_0: (B, num_heads, past_decode_sequence_length, head_size) - past_key_cross_0: (B, num_heads, encode_sequence_length, head_size) - past_value_cross_0: (B, num_heads, encode_sequence_length, head_size) **Decoder Outputs**: - logits: (B, 1, vocab_size) - present_key_self_0: (B, num_heads, past_decode_sequence_length + 1, head_size) - present_value_self_0: (B, num_heads, past_decode_sequence_length + 1, head_size) Known issues: - Some postprocessing (like converting to use decoder masked MHA, past and present buffer sharing) is not done. Could be a future work item to integrate with onnxruntime-genai. ### Motivation and Context Make the encoder onnx model simpler and more efficient in inference (no need to output encoder_hidden_states).
### Description Adding back missing dist folder ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description This PR moves the CUDA memcpy for the QK output when type `T` is equal to type `QK` from `attention_impl.cu` into `attention_qk.cu`. ### Motivation and Context This PR fixes a linkage error when type `T` and type `QK` are the same in `attention_qk.cu`.
…/nextjs-default (microsoft#24132) Bumps [next](https://github.com/vercel/next.js) from 15.1.2 to 15.2.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v15.2.3</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Update default allowed origins list (<a href="https://redirect.github.com/vercel/next.js/issues/77212">#77212</a>)</li> <li>unify allowed origin detection handling (<a href="https://redirect.github.com/vercel/next.js/issues/77053">#77053</a>)</li> <li>Add dev warning for cross-origin and stabilize allowedDevOrigins (<a href="https://redirect.github.com/vercel/next.js/issues/77044">#77044</a>)</li> <li>Ensure deploymentId is used for CSS preloads (<a href="https://redirect.github.com/vercel/next.js/issues/77210">#77210</a>)</li> <li>Update middleware request header (<a href="https://redirect.github.com/vercel/next.js/issues/77201">#77201</a>)</li> <li>[metadata] remove the default segement check for metadata rendering (<a href="https://redirect.github.com/vercel/next.js/issues/77119">#77119</a>)</li> <li>[ts-hint] fix vscode type hint plugin enabling (<a href="https://redirect.github.com/vercel/next.js/issues/77099">#77099</a>)</li> <li>[metadata] re-insert icons to head for streamed metadata (<a href="https://redirect.github.com/vercel/next.js/issues/76915">#76915</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/ijjk"><code>@ijjk</code></a>, <a href="https://github.com/ztanner"><code>@ztanner</code></a>, and <a href="https://github.com/huozhi"><code>@huozhi</code></a> for helping!</p> <h2>v15.2.2</h2> <h3>Core Changes</h3> <ul> <li>[dev-overlay] fix styling on overflow error messages, add button hover state: <a href="https://redirect.github.com/vercel/next.js/issues/76771">#76771</a></li> <li>Fix: respond 405 status code on OPTIONS request to SSG page: <a href="https://redirect.github.com/vercel/next.js/issues/76767">#76767</a></li> <li>[dev-overlay] Always show relative paths: <a href="https://redirect.github.com/vercel/next.js/issues/76742">#76742</a></li> <li>[metadata] remove the duplicate metadata in the error boundary: <a href="https://redirect.github.com/vercel/next.js/issues/76791">#76791</a></li> <li>Upgrade React from <code>d55cc79b-20250228</code> to <code>443b7ff2-20250303</code>: <a href="https://redirect.github.com/vercel/next.js/issues/76804">#76804</a></li> <li>[dev-overlay] Ignore animations on page load: <a href="https://redirect.github.com/vercel/next.js/issues/76834">#76834</a></li> <li>fix: remove useless set-cookie in action-handler: <a href="https://redirect.github.com/vercel/next.js/issues/76839">#76839</a></li> <li>Turbopack: handle task cancelation: <a href="https://redirect.github.com/vercel/next.js/issues/76831">#76831</a></li> <li>Upgrade React from <code>443b7ff2-20250303</code> to <code>e03ac20f-20250305</code>: <a href="https://redirect.github.com/vercel/next.js/issues/76842">#76842</a></li> <li>add types for <code>__next_app__</code> module loading functions: <a href="https://redirect.github.com/vercel/next.js/issues/74566">#74566</a></li> <li>fix duplicated noindex when server action is triggered: <a href="https://redirect.github.com/vercel/next.js/issues/76847">#76847</a></li> <li>fix: don't drop queued actions when navigating: <a href="https://redirect.github.com/vercel/next.js/issues/75362">#75362</a></li> <li>[dev-overlay]: remove dependency on platform for focus trapping: <a href="https://redirect.github.com/vercel/next.js/issues/76849">#76849</a></li> <li>Turbopack: Add <strong>turbopack_load_by_url</strong>: <a href="https://redirect.github.com/vercel/next.js/issues/76814">#76814</a></li> <li>Add handling of origin in dev mode: <a href="https://redirect.github.com/vercel/next.js/issues/76880">#76880</a></li> <li>[dev-overlay] Stop grouping callstack frames into ignored vs. not ignored: <a href="https://redirect.github.com/vercel/next.js/issues/76861">#76861</a></li> <li>Upgrade React from <code>e03ac20f-20250305</code> to <code>029e8bd6-20250306</code>: <a href="https://redirect.github.com/vercel/next.js/issues/76870">#76870</a></li> <li>[dev-overlay] Increase padding if no <code>x</code> button present: <a href="https://redirect.github.com/vercel/next.js/issues/76898">#76898</a></li> <li>fix: prevent incorrect searchParams being applied on certain navs: <a href="https://redirect.github.com/vercel/next.js/issues/76914">#76914</a></li> <li>[dev-overlay] Dim ignore-listed callstack frames when shown: <a href="https://redirect.github.com/vercel/next.js/issues/76862">#76862</a></li> </ul> <h3>Example Changes</h3> <ul> <li>chore(cna): update tailwind styles to be closer to non-tw cna: <a href="https://redirect.github.com/vercel/next.js/issues/76647">#76647</a></li> </ul> <h3>Misc Changes</h3> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vercel/next.js/commit/535e26d3c69de49df8bd17618a424cbe65ec897b"><code>535e26d</code></a> v15.2.3</li> <li><a href="https://github.com/vercel/next.js/commit/2fcae1d7e3079874ff633b5b8311adb584c80ce6"><code>2fcae1d</code></a> Update default allowed origins list (<a href="https://redirect.github.com/vercel/next.js/issues/77212">#77212</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/adf5462b5f269963395b0a2ef12a1b66e8cadabc"><code>adf5462</code></a> unify allowed origin detection handling (<a href="https://redirect.github.com/vercel/next.js/issues/77053">#77053</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/5e59da1f5c8b9e8b3a759048bd371efcd77813ae"><code>5e59da1</code></a> Add dev warning for cross-origin and stabilize allowedDevOrigins (<a href="https://redirect.github.com/vercel/next.js/issues/77044">#77044</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/8151cb6ce921cb1b9faeab6fb88551146dc206b7"><code>8151cb6</code></a> Ensure deploymentId is used for CSS preloads (<a href="https://redirect.github.com/vercel/next.js/issues/77210">#77210</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/52a078da3884efe6501613c7834a3d02a91676d2"><code>52a078d</code></a> Update middleware request header (<a href="https://redirect.github.com/vercel/next.js/issues/77201">#77201</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/4698ad6478cc85a7283a8c41edfbba023dadf57d"><code>4698ad6</code></a> [metadata] remove the default segement check for metadata rendering (<a href="https://redirect.github.com/vercel/next.js/issues/77119">#77119</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/1e1ff403a28703b08e68758cfcbb7b6c97c4bd2a"><code>1e1ff40</code></a> [ts-hint] fix vscode type hint plugin enabling (<a href="https://redirect.github.com/vercel/next.js/issues/77099">#77099</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/88deb12b03c90f5146b1270cd7bea3517cf90083"><code>88deb12</code></a> [metadata] re-insert icons to head for streamed metadata (<a href="https://redirect.github.com/vercel/next.js/issues/76915">#76915</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/f4552826e1ed15fbeb951be552d67c5a08ad0672"><code>f455282</code></a> v15.2.2</li> <li>Additional commits viewable in <a href="https://github.com/vercel/next.js/compare/v15.1.2...v15.2.3">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ps (microsoft#24090) ### Description <!-- Describe your changes. --> Support shape inference for QLinearAdd and QLinearMul ops which were missing in symbolic_shape_infer.py ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is required to enable shape inference for models with "QLinearAdd" ops which are defined in com.microsoft domain and the shapes of which cannot be inferred using onnx shape_inference alone. Fixes issue microsoft#24028 --------- Signed-off-by: Praveen G <[email protected]>
…ine (microsoft#23580) ### Description Follow-up to microsoft#23551 Adds the BrowserStack testing stage for Android to the NuGet packaging pipeline. This test tests that the NuGet package produced will be imported and work correctly on an Android device [Pipeline run that shows what a failing unit test would look like](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=670961&view=results) --------- Co-authored-by: Edward Chen <[email protected]>
### Description Add fp16 support to sparse attention ### Motivation and Context Generalize models for CPU and GPU
### Description This PR refactors the mac CI pipeline: - Use composite action and reusable workflow to put together duplicated code - separate each EP
### Description Create a separate template overloads to address Windows Debug build warning 'unreachable code'.
…oft#24115) ### Description This PR introduced a new WebGPU EP option `preserveDevice`. Before this change, a WebGPU device will be destroyed when no inference session uses it. The destroy of a WebGPU device will cleanup both buffer cache and shader cache. After this option is introduced, when the option is ON (default value is OFF), the device will no longer be destroyed and will be always keep alive. This is helpful in 2 scenarios: - A server that will be always on - unittest so that bugs of incorrect shader cache may be detected. (thanks to @qjia7 for the suggestion)
…oft#24014) ### Description <!-- Describe your changes. --> This gives a way for webapp developers to customize the bundler behavior regarding whether to bundle the wasm. To avoid treating ort-wasm-threaded-simd.jsep.mjs and ort-wasm-threaded-simd.jsep.wasm as dependencies during the process of bundler build, use import condition `onnxruntime-web-use-extern-wasm`. For webpack: ``` module.exports = { //... resolve: { conditionNames: ['onnxruntime-web-use-extern-wasm', 'import', 'module'], }, }; ``` For esbuild: ``` await esbuild.build({ //... conditions: ['onnxruntime-web-use-extern-wasm', 'import', 'module'], }) ``` For rollup: ``` import { nodeResolve } from '@rollup/plugin-node-resolve'; export default { //... plugins: [nodeResolve({ exportConditions: ['onnxruntime-web-use-extern-wasm', 'import', 'module', 'development|production'] })] }; ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - microsoft#24009
…oft#23937) ### Description Add API for accessing metadata of a model's input/output. Currently, The implementation is only applied to web assembly backend and nodejs binding. For webgl, there is so far no plan to implement this API; for react-native, the implementation will be done later and is not included in this PR. #### Example usage: ```js const mySession = await ort.InferenceSession.create( ... ); console.log(`there are ${mySession.inputMetadata.length} inputs:`); for (let i = 0; i < mySession.inputMetadata.length; i++) { let info; if (mySession.inputMetadata[i].isTensor) { info = `tensor: ${mySession.inputMetadata[i].type}, shape: ${mySession.inputMetadata[i].shape}`; } else { info = `non-tensor`; } console.log(`input ${i}: ${mySession.inputMetadata[i].name}: ${info}`); } ``` possible output: ``` there are 1 inputs: input 0: input: tensor: float32, shape: [batch, 3, 224, 224] ``` Resolves: - microsoft#22682 - microsoft#22949
### Description add cache "onnxnodetests" for node tests This fixes the random download network error for onnx node tests data. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description Add Native Matmul (`MatMulNaive`, `MatMulPacked` and `MatMulPackedVec4` ) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description Big model pipeline are still using cuda 11.8. This update the pipeline to use cuda 12.x. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…icrosoft#24151) ### Description Show proper error message when fp16 model is used for Beam Search in CPU. Before: ``` 2025-02-15 20:15:02.999160115 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: bad_function_call ``` After: ``` onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: onnxruntime/onnxruntime/contrib_ops/cpu/transformers/beam_search.cc:309 virtual onnxruntime::common::Status onnxruntime::contrib::transformers::BeamSearch::Compute(onnxruntime::OpKernelContext*) const BeamSearch does not support float16 model on CPU execution provider. Use float32 model or CUDA execution provider instead. ``` ### Motivation and Context microsoft#23728
### Description
As titled.
### Motivation and Context
We have the last MatMul in phi-4-mini onnx which is b_shape = {3072,
200064}
packed_b_size = MlasGemmPackBSize(N, K);
it is `3072*200064*sizeof(float)=2458386432`
This is larger than 2,147,483,647, it is out of the int boundary on a
32-bit system. Then len is negative.
So we change the type to size_t, and the model can be loaded
successfully after the change.
ankitm3k
approved these changes
Mar 25, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backmerging with msft commits