Backmerging with Msft commits #621

jatinwadhwa921 · 2025-03-18T15:12:20Z

Backmerging with Msft commits

### Description Make [Python-Cuda-Publishing-Pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1311&_a=summary) 1ES compliant ### Motivation and Context

### Description This PR change all reference to yarn to npm ### Motivation and Context This PR is needed to address all Component Governce issue that ORT is facing ### Current issue - [x] use_react_native!(:path => config["reactNativePath"]) return nil - [x] For error `CocoaPods could not find compatible versions for pod "RCTRequired"`, we might need to increase iOS targe version from 13.0 to a higher version. - [x] For 'react-native' >= 0.73.x , react-native/react.gradle file is no longer used - [x] We need to update to gradle 7.6 or above to upgrade the RN. current gradlew version 7.3.3 that we use does not works on RN 71+. - [x] Instruction on how to implement the React-Native has changed since [0.72](https://reactnative.dev/docs/integration-with-existing-apps). - [x] Error `The new Java toolchain feature cannot be used at the project level in combination with source and/or target compatibility` from gradle. - [x] duplicate class: com.facebook.react.PackageList solution: remove `apply from: file("../../node_modules/@react-native-community/cli-platform-android/native_modules.gradle"); applyNativeModulesAppBuildGradle(project)` from bottom of andoird/app/build.gradle - [x] Need to update the OnnxruntimeModuleTest because `ReactApplicationContext` is now a abstract class. --------- Co-authored-by: Edward Chen <[email protected]>

…osoft#23524) ### Description  ### Motivation and Context

) ### Description Replace microsoft#23445, resolve conflicts and add one new file. --------- Co-authored-by: Changming Sun <[email protected]>

### Description  ### Motivation and Context

Resolve microsoft#23954

@set

### Description The vars are set by cmake\external\emsdk\emsdk_env.bat ### Motivation and Context By default they are filtered by vcpkg to make build reproducible. However, emscripten's cmake toolchain file needs this information. emcc.bat has the following code: ``` @set EM_PY=%EMSDK_PYTHON% @if "%EM_PY%"=="" ( set EM_PY=python ) ``` Actually, it doesn't work as expected. the line ``` set EM_PY=python ``` should be changed to ``` set EM_PY=python.exe ``` We haven't hit this issue because usually the var EM_PY is set.

### Description [Fix ONNX Runtime Python Test Pipeline ](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1164&_a=summary) ### Motivation and Context

### Description  ### Motivation and Context

…23981) Increases WebGPU operator coverage

### Description Added ReduceMax and ReduceSum ### Motivation and Context

### Description To fix the CMake configuration error when a dependency brought in via FetchContent uses find_package(Eigen3 REQUIRED) Major Changes： - enable EIGEN_BUILD_CMAKE_PACKAGE - [optional] rename eigen to Eigen3 ### Motivation and Context Get the following build error when Dependencies use find_package(Eigen3 REQUIRED) ``` By not providing "FindEigen3.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Eigen3", but CMake did not find one. Could not find a package configuration file provided by "Eigen3" with any of the following names: Eigen3Config.cmake eigen3-config.cmake Add the installation prefix of "Eigen3" to CMAKE_PREFIX_PATH or set "Eigen3_DIR" to a directory containing one of the above files. If "Eigen3" provides a separate development package or SDK, be sure it has been installed. ``` Eigen need enable **EIGEN_BUILD_CMAKE_PACKAGE** when FetchContent for generate **Eigen3Config.cmake** https://gitlab.com/libeigen/eigen/-/blob/master/CMakeLists.txt?ref_type=heads#L213 in addition ， the eigen‘s project name is "Eigen3" and providing the cmake configuration file is "Eigen3Config.cmake" : https://gitlab.com/libeigen/eigen/-/blob/master/CMakeLists.txt?ref_type=heads#L36 https://gitlab.com/libeigen/eigen/-/blob/master/CMakeLists.txt?ref_type=heads#L252 So I think it's best for FetchContent_Declare Name to be consistent with the project name to avoid potential errors. Co-authored-by: mingyue <[email protected]>

### Description Same as microsoft#23169 ### Motivation and Context Same as microsoft#23169

### Description Add DNNL github workflow which is migrated from "Windows CPU CI pipeline" from Azure DevOps. This PR also adds "--build_nuget" to test the C# part. However, then I hit an error when building the tests in "test\Microsoft.ML.OnnxRuntime.Tests.NetCoreApp\Microsoft.ML.OnnxRuntime.Tests.NetCoreApp.csproj". The error message was: ``` D:\a\_work\onnxruntime\onnxruntime\csharp\test\Microsoft.ML.OnnxRuntime.Tests.Common\TrainingTest.cs(34,81): error CS0103: The name 'CheckpointState' does not exist in the current context [D:\a\_work\onnxruntime\onnxruntime\csharp\test\Microsoft.ML.OnnxRuntime.Tests.NetCoreApp\Microsoft.ML.OnnxRuntime.Tests.NetCoreApp.csproj] ``` Then I checked the code. I couldn't understand how it worked before. In this build, `__TRAINING_ENABLED_NATIVE_BUILD__` is not defined. But the "CheckpointState" class is defined in https://github.com/microsoft/onnxruntime/blob/main/csharp/src/Microsoft.ML.OnnxRuntime/Training/CheckpointState.shared.cs#L21 And the file is empty when __TRAINING_ENABLED_NATIVE_BUILD__ is not defined. So I don't understand how it could work in a normal build without dnnl. Here is my build command: ``` python tools\ci_build\build.py --config RelWithDebInfo --build_dir dnnlbuild --skip_submodule_sync --build_csharp --parallel --use_binskim_compliant_compile_flags --cmake_generator "Visual Studio 17 2022" --build_shared_lib --enable_onnx_tests --build_wheel --msbuild_extra_options "IncludeMobileTargets=false" --build_nuget --use_vcpkg --use_vcpkg_ms_internal_asset_cache --use_dnnl ``` This PR removes the failed test.

### Description Qnn weight sharing improvement so that only the last session in the weight sharing group (the session that has both share_ep_contexts and stop_share_ep_contexts enabled) generates the .bin file. The .bin file name is decided from the 1st session. And all generated *_ctx.onnx models point to this single .bin to avoid post-processing work. Previously each session generates a _ctx.onnx model with a .bin file. So it requires post-processing work to go through generated *_ctx.onnx models to get the last generated *_ctx.bin file and update all *_ctx.onnx to point to the same .bin file and remove the .bin files not used.

### Description Previously will got CMake Error at build/Android/intermediates/armeabi-v7a/vcpkg/buildtrees/0.vcpkg_dep_info.cmake:15: Parse error. Expected a newline, got identifier with text "set".

…24019) ### Description Allow to specify `UseIndicesTypeAlias` for `AddIndices` in `ShaderHelper`.

### Description This change allows more overloads for the `Program::AddIndices` method, and makes use of r-value references for parameters when possible. Also fixed the implementation of the `AddInputs` and `AddOutputs` methods to use r-value references for the parameters

### Description the `BaseTester::Run` function signature is: ```c++ void BaseTester::Run(ExpectResult expect_result, const std::string& expected_failure_string, const std::unordered_set<std::string>& excluded_provider_types, const RunOptions* run_options, std::vector<std::unique_ptr<IExecutionProvider>>* execution_providers, ExecutionMode execution_mode, const Graph::ResolveOptions& options); ``` Its behavior is: - if the parameter `execution_providers` is empty, it will try to aggregate all execution providers available in the build, and for each EP, create inference session and perform test. - if the parameter `execution_providers` is not empty, it will run a single inference session, use the passed-in `execution_providers` as session options and perform test. The old code may put multiple EPs into single inference sessions, but at runtime there will be only one EP running the test. Specifically, WebGPU EP is after CPU EP in this case, so the test never run on WebGPU EP. **To reviewers**: if you see **a lot of** changes, click the "setting" button next to the "Jump to", <img width="277" alt="image" src="https://github.com/user-attachments/assets/e8947ffb-f230-4c59-a5b7-36c0aedd2b7c" /> and check the "Hide Whitespace" and load it again. <img width="137" alt="{4D60F676-35F4-4546-B8E1-E2F42411A9E6}" src="https://github.com/user-attachments/assets/f4c58e6e-c290-49f7-aca7-c413db1e3c77" />

### Description * Fix broadcast on attention bias dim 1. * Increase test cases in test_mha.py in pipeline to cover the testing. ### Motivation and Context This feature was added in microsoft#21710. There was bug when computing the offset when attention bias broadcast on dim 1 only in both CUDA and CPU kernel. It can be triggered when attention bias shape is like [batch_size, 1, sequence_length, total_sequence_length] and batch_size > 1 when unfused kernel is selected. Note that cudnn flash attention and cutlass fused attention also supports attention bias, so the bug in unfused kernel was not discovered previously.

### Description Fix a warning from analyzers: ``` Theory method 'CanRunInferenceOnAModelDotnetTensors' on test class 'InferenceTest' does not use parameter 'enableParallelExecution'. Use the parameter, or remove the parameter and associated data. (https://xunit.net/xunit.analyzers/rules/xUnit1026 ``` ### Motivation and Context

…tithreading scenario (microsoft#24010) The GPU device is set again at compute function/compute time to handle multithreading scenarios. Consider the following: Users can create multiple threads to initialize separate inference sessions on different devices (not just the default device 0) Later, additional threads may be spawned to execute inference_session.Run(), which calls this compute function. Since new threads default to using device 0, it’s necessary to explicitly set the correct device to ensure computations run on the intended GPU. Example code: ````python provider = [ [ ('TensorrtExecutionProvider', { 'device_id': 0, }), ], [ ('TensorrtExecutionProvider', { 'device_id': 1, }), ] ] class ThreadObj(): def __init__(self, model_path: str, iterations: int, idx: int): ... sess_opt = ort.SessionOptions() self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2]) def warmup(self): self.inference_session.run(None, self.input) def run(self, thread_times, threads_complete): for iter in range(self.iterations): self.inference_session.run(None, self.input) def thread_target(obj, thread_times, threads_complete): obj.run(thread_times, threads_complete) ... iterations = 500 num_threads = 13 t_obj_list = [] thread_list = [] for tidx in range(num_threads): obj = ThreadObj(model_path, iterations, tidx) t_obj_list.append(obj) obj.warmup() for t_obj in t_obj_list: thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,)) thread.start() thread_list.append(thread) ... ```` Note: Based on our measurements (using cuda event) on the A100 GPU with CUDA 12, the execution time for `cudaSetDevice` is approximately 0.004 ms, which is negligible and does not impact runtime performance.

…ft#24030)

…icrosoft#24036) ### Description  Reverting as this issue disappeared after adapting newer TRT api. This has been validated by building ORT 1.20.1/1.21.0 debug build and testing on FRCNN/resnet50 models. ### Motivation and Context

…added for dependencies. (microsoft#24034) Set CMAKE_POLICY_DEFAULT_CMP0069 to NEW to ensure that interprocedural optimization (IPO) flags are added for dependencies. If the OLD behavior is used, the IPO flags are only added for the Intel compiler on Linux.

### Description Make [Cuda packaging pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1287&_a=summary) 1ES compliant ### Motivation and Context  ### Check List - [x] pool `onnxruntime-Win-CPU-2022` not found

…ft#24032) ### Description Add `--webgpu-ep=runtime` to allow build ort-web with both WebGPUEP and JSEP, while at runtime use `globalThis.WEBGPU_EP` to switch between them. This change helps to do perf comparison between WebGPU EP and JSEP much easier.

…atchNeon initialization. (microsoft#24018) Move call to `MLAS_CPUIDINFO::GetCPUIDInfo()` out of `MlasSQNBitGemmDispatchNeon` initialization. Reduce binary size when MatMulNBits op is not included in the build. I believe the side effect of `MLAS_CPUIDINFO::GetCPUIDInfo()` (e.g., initializing a static object) prevents the linker from discarding the code in a build where the associated MLAS functions are unused.

) ### Description  ### Motivation and Context  Co-authored-by: Yulong Wang <[email protected]>

### Description remove duplicated file in nodejs package. microsoft#23956

…rator (microsoft#23944) ### Description - Added support for custom position ids and attention masks to the GQA CPU operator (fp32 and fp16) - Added MLAS eltwise add kernel for mask application for FP32 and FP16 - Added unit tests for the added eltwise add MLAS kernel - Modified python tests to test the new GQA inputs ### Motivation and Context Custom position ids and attention mask are required in order to implement speculative decoding in PhiSilica ### Benchmarks All the benchmarks are executed on the GQA op configuration which will be used in the PhiSilica speculative decoding secnario, and the configuration is as follows: - num_heads: 32 - kv_num_heads: 32 - do_rotary: 1 - local_window_size: -1 - head_size: 96 - sequence_length: 6 - packed_qkv: True Benchmarks were executed on Cadmus with Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz In the tables below, column headers are total sequence length values used for benchmarking, and the row values are if the attention bias was used or not. Values are average inference time in ms over 100000 runs. #### Fp16 results | Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 | 2000 | 2500 | 3000 | 3500 | 4000 | |:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------| | Without bias | 0.284054 | 0.257449 | 0.275806 | 0.334123 | 0.458324 | 0.614133 | 0.912791 | 1.38585 | 1.92186 | 2.39203 | 2.88808 | 3.46262 | | With bias | 0.250926 | 0.253072 | 0.279724 | 0.337774 | 0.499058 | 0.585388 | 0.914316 | 1.40701 | 1.87311 | 2.47475 | 3.3906 | 3.47474 | | Runtime increase | -11.66% | -1.7% | +1.42% | +1.09% | +8.89% | -4.68% | +0.17% | +1.53% | -2.54% | +3.46% | +17.4% | +0.35% | #### Fp32 results | Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 | 2000 | 2500 | 3000 | 3500 | 4000 | |:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------|:--------| | Without bias | 0.259049 | 0.270541 | 0.304583 | 0.376708 | 0.554013 | 0.633217 | 1.20696 | 1.65985 | 1.95169 | 2.45807 | 3.05637 | 4.05169 | | With bias | 0.261631 | 0.268002 | 0.300853 | 0.370452 | 0.529865 | 0.735216 | 1.43493 | 1.4385 | 1.99028 | 2.3858 | 2.99425 | 4.80197 | | Runtime increase | +1.0% | -0.94% | -1.22% | -1.66% | -4.36% | +16.11% | +18.89% | -13.34% | +1.98% | -2.94% | -2.03% | +18.52% | --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

This PR adds some workarounds to enable int64 support for some WebNN backends which don't support int64 data type. - Do not fallback ops that are specifically due to the int64 limitation. - Convert all int64 initializer and input values to int32 and handle potential overflow errors. - Register all int64 model inputs and outputs as int32 ml-tensor. - Handle ONNX ops that need inputs or outputs conversion between int64 and int32. e.g. ArgMax, ArgMin, Cast, etc. - Convert int64 output data back to int32. - Disallow int64 outputs as 'ml-tensor' preferredOutputLocation. Fixed microsoft#21401

… Actions (microsoft#24029) ### Description Convert Windows GPU pipelines and Windows OpenVino pipeline to Github Actions

…t#23978) ### Description Fix fp16 const initialization on no-fp16 platform [such as Raspberry PI](microsoft#23957) ### Motivation and Context Resolve microsoft#23957

…roupQueryAttention operator (microsoft#23386) ### Description Add Packed QKV inputs and do_rotary attribute to GQA. ### Motivation and Context  Packed QKV inputs and do_rotary attribute are required for certain models.

### Description This PR re-designs how Whisper is created and supported in ONNX Runtime. The new solution leverages [previous optimization work](microsoft#15473), and it is designed to be used in conjunction with [this work](microsoft/onnxruntime-genai#1229) in ONNX Runtime GenAI. Some of the added changes include: - Re-designed export that creates new ONNX models without needing a `WhisperBeamSearch` op - Creates one encoder model that also pre-computes the cross-attention KV caches (since they only need to be calculated once) - Creates one decoder model that can be used during pre-fill and token generation - Creates one jump-times model that can be used for word-level timestamps - Removes need for a `WhisperBeamSearch` op to chain the encoder and decoder subgraphs - Removes need to duplicate decoder's weights in memory - Previous solution with the `WhisperBeamSearch` op created an encoder-decoder-init model and decoder-with-past model. The decoder was duplicated twice, one in each. - Removes need for separate logic to export the PyTorch model coming from OpenAI vs. the PyTorch model coming from Hugging Face - Re-factors common parameters and logic used in CPU and CUDA attention kernels - Adds `DUMP_STRING` to enable easy logging of intermediate information when running in debug mode to debug a problem. This info is not printed in release mode so it will not impact performance. - Integrates `DecoderMaskedMultiHeadAttention` into `MultiHeadAttention` - Enables past-present buffer sharing in the `MultiHeadAttention` op for improved performance - Adds `cache_indirection` and `past_sequence_length` as new optional inputs to `MultiHeadAttention` - Adds `output_qk` as new optional output to `MultiHeadAttention` - Enables calculating `output_qk` tensor with FP16 or FP32 precision, regardless of the model's precision - CI tests that run end-to-end across various flag combinations that are used by many customers internally and externally The existing solutions are still available if desired. ### Known Issues - The FP32 CPU model with the `WhisperBeamSearch` op and output QK is currently disabled. This is because ONNX Runtime doesn't currently support output QK kernels on CPU, only on CUDA. - The `DecoderMaskedMultiHeadAttention` CPU kernel has a parity mismatch with the `DecoderMaskedMultiHeadAttention` CUDA kernel. - Using `DecoderMaskedMultiHeadAttention` for the FP32 CPU model is not enabled. Currently, it uses `MultiHeadAttention` to avoid the parity mismatch issue. ### Motivation and Context Using the beam search op has made it more difficult to debug and fix errors that are encountered. This new approach is more flexible and more customizable for users (e.g. by running with ONNX Runtime GenAI). It also helps [this issue](microsoft#18216). --------- Co-authored-by: mindest <[email protected]>

…missing (microsoft#24053) ### Description When we fail to load a provider shared DLL in windows, the error is not very specific. Users have to figure out if the onnxruntime file is missing, a cuda file, or cudnn is not installed (and perhaps others). And this is just the cuda provider. It would be far more useful if it would say exactly what file is missing so the user can fix the actual problem. Plus, this will likely result in many fewer github issues regarding this problem, but if they do, they will be much easier to fix. This fix adds a function that will try loading a dll and its dependencies recursively to figure out which file is missing. It uses the OS dbghelp library to do it and is not very complex. This also fixes a many year old bug that was introduced in the change to use FormatMessage in env.cc, where the system error would always be an empty string `error 126 ""` due to passing 0 as the format buffer length. We will now see the more useful `The specified module could not be found.` style error messages. ### Motivation and Context Previously if we fail to load the cuda provider, the error would look like this, which is limited: `unknown file: error: C++ exception with description " onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll"` Now it will look like this if cudnn is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which depends on "cudnn64_9.dll" which is missing. (Error 126: "The specified module could not be found.")` If cuda is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which depends on "cudart64_12.dll" which is missing. (Error 126: "The specified module could not be found.")` And if onnxruntime_providers_cuda.dll is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which is missing. (Error 126: "The specified module could not be found.") `

…t#23928) ### Description  * Update range to build SASS on all arch and PTX on highest arch * when cuda>=12.8, build all arch (including latest blackwell) ### Motivation and Context  https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

@fs-eire

…23968) This change reduces the number of staging buffers used for uploading initializers to the GPU. On the one hand, we early release the upload staging buffers. On the other hand, we use the BufferMapExtendedUsages feature of Dawn on UMA GPUs, which allows us to directly write into the dest GPU buffer without the need of a staging buffer. To achieve this, we need to ensure the UMA GPU buffers are mapped at creation. We have BufferManager to be awared of OnSessionInitializationEnd(), so that it can handle buffer Create() and Upload() calls properly. Credits to @fs-eire for the overall design of implementation.

### Description  Adds naive implementations of ReduceMin, ReduceProd, ReduceL1, ReduceL2, ReduceLogSum, ReduceSumSquare, and ReduceLogSumExp. Will optimize to use shared memory in a later PR. ### Motivation and Context  Increases WebGPU EP operator coverage.

ankitm3k

lgtm

jchen351 and others added 30 commits March 10, 2025 12:29

Fix npm audit in js/react-native/e2e (microsoft#23975)

f18e9fa

Suppress some warnings in WebGPU EP generated by GCC 13 (microsoft#23984

6443626

) ### Description Replace microsoft#23445, resolve conflicts and add one new file. --------- Co-authored-by: Changming Sun <[email protected]>

Fix NPM audit in js/react-native (microsoft#23974)

d010acb

### Description  ### Motivation and Context

Bump axios from 1.7.9 to 1.8.2 in /js/node (microsoft#23963)

9118b1d

GCC 14: fix insert_or_assign() call (microsoft#23955)

5672cf7

Resolve microsoft#23954

[webgpu] Fix the continuation issue (microsoft#23999)

16d6f39

### Description  ### Motivation and Context

[WebGPU EP] Implements Gelu, BiasSplitGelu, and QuickGelu (microsoft#…

9891eb3

…23981) Increases WebGPU operator coverage

[Native WebGPU] Added ReduceMax and ReduceSum (microsoft#23934)

6dd6ef9

### Description Added ReduceMax and ReduceSum ### Motivation and Context

Convert Windows CPU CI Pipeline to Github Actions (microsoft#23996)

47bd046

Update onnxruntime_c_api.h to work with MinGW (microsoft#24006)

5e05729

### Description Same as microsoft#23169 ### Motivation and Context Same as microsoft#23169

Correct generated cmake syntax (microsoft#24016)

11216a4

### Description Previously will got CMake Error at build/Android/intermediates/armeabi-v7a/vcpkg/buildtrees/0.vcpkg_dep_info.cmake:15: Parse error. Expected a newline, got identifier with text "set".

[webgpu] allow to specify UseIndicesTypeAlias for Indices (microsoft#…

1362e7c

…24019) ### Description Allow to specify `UseIndicesTypeAlias` for `AddIndices` in `ShaderHelper`.

Increase timeout for ARM64-Xcode16-targeting-iphonesimulator (microso…

3f71d63

…ft#24030)

Support tvOS build (microsoft#24000)

1fc6d8c

fs-eire and others added 15 commits March 14, 2025 10:42

avoid copy unnecessary files for nodejs pkg (microsoft#23992)

41c239d

### Description remove duplicated file in nodejs package. microsoft#23956

Convert Windows GPU pipelines and Windows OpenVino pipeline to Github…

b896666

… Actions (microsoft#24029) ### Description Convert Windows GPU pipelines and Windows OpenVino pipeline to Github Actions

[ARM CPU] Fix fp16 const initialization on no-fp16 platform (microsof…

f22ee08

…t#23978) ### Description Fix fp16 const initialization on no-fp16 platform [such as Raspberry PI](microsoft#23957) ### Motivation and Context Resolve microsoft#23957

Merge branch 'master' into sync_msft_18_3_25

91e64fe

jatinwadhwa921 requested a review from ankitm3k March 18, 2025 15:12

ankitm3k approved these changes Mar 18, 2025

View reviewed changes

jatinwadhwa921 merged commit 6083601 into ovep-develop Mar 18, 2025
4 of 10 checks passed

jatinwadhwa921 deleted the sync_msft_18_3_25 branch April 15, 2025 05:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backmerging with Msft commits #621

Backmerging with Msft commits #621

Uh oh!

jatinwadhwa921 commented Mar 18, 2025

Uh oh!

ankitm3k left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Backmerging with Msft commits #621

Backmerging with Msft commits #621

Uh oh!

Conversation

jatinwadhwa921 commented Mar 18, 2025

Uh oh!

ankitm3k left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants