Skip to content

Conversation

@jatinwadhwa921
Copy link

Backmerging with Msft commits

HectorSVC and others added 22 commits March 18, 2025 08:22
…t#24065)

### Description
add bool support to EPContext schema to unblock some models
### Error

```Traceback
/onnxruntime/onnxruntime/core/providers/webgpu/reduction/reduction_ops.cc:146 [allow_multi_axes = true] Axes values must be in the range [-rank, rank-1]. Got: 446098880
```
### Description
Upgrade current MacOS-13 to 14


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

- [x] Update the RN to 0.73.x+ to have the newer version of boost

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Abs and Sign had bfloat16 kernels created but not registered with the
CUDA EP. Additionally Sign bfloat16 didn't work.
* register bfloat16 kernels with CUDA EP
* fix incorrectly named macro by adding 'X' as they add bfloat16
registration
* add specialization for bfloat16 to _Sign
  * copied existing pattern. not sure if there's a better way
* update tests



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
microsoft#23875
…soft#24086)

### Description

Improve the OrtValue interface typing and changed `staticmethod` to
`classmethod` for constructors to follow python conventions
(https://google.github.io/styleguide/pyguide.html#2174-decision).
…icrosoft#24078)

The DP4AMatMulQuantize shader needs to make sure that K is divisible by
128. Otherwise, we need align the scale
to have shape [M, ceil(K / 128)]. To simplify the shader, we limit that
K must be divisible by 128 to apply dp4a matmul.
### Description

Add macOS ARM64 pipeline for webgpu.

This pipeline is a temporary one. I created this pipeline because the
current code already fails on macOS ARM64 for WebGPU EP. Adding this
pipeline allows to check the status of the fix, and eventually when the
build passes, this pipeline will be merged with the existing macOS arm64
pipeline.
…crosoft#23998)

- Renamed all conflicting WebNN methods from `jsep*` to `webnn*`.
- WebNN doesn't need flush(), therefore it doesn't need to set
`jsepBackend`.

This PR addresses issue microsoft/webnn-developer-preview#78
### Description
Enables multithreading on FP16 to FP32 cast operator.



### Motivation and Context
Improves CPU performance on FP16 models that require casting to FP32.
### Description
Move Android CI Pipeline to Github Actions
…#23490)

### Description
Cleanup CoreML EP's code to remove the COREML_ENABLE_MLPROGRAM macro.
Also, increase MINIMUM_COREML_VERSION(first version we support) to 5 .
…olve warning (microsoft#23847)

### Description
Removes namespace from AndroidManifest.XML



### Motivation and Context
- Resolves microsoft#21681
### Description

Use custom implementation for Pow to fix test failures.
…microsoft#24091)

### Description
<!-- Describe your changes. -->

There are still some timeout for the pipeline. further extend the
timeout to 90 minutes for ARM64-Xcode16-targeting-iphonesimulator.

It takes quite a while if all build cache is missing.

### Motivation and Context

The pipeline sometimes failed because of timeout. There is a previous PR
microsoft#24030 to increase the timeout from 60min to 75 min but it looks like
not enough.
…ft#24108)

### Description

fix test failure in Reduce operators on macOS ARM64

```
[E:onnxruntime:ReduceL1, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running ReduceL1 node. Name:'node1' Status Message: webgpu_context.cc:259 Run Uniform variable[0] (output_size) data type mismatch in program "ReduceL1", Expected: u32, Actual: i32
```
This PR uses 1d disptach group size and uses workgroup_idx instead of
workgroup.x|workgroup.y in case they are normalized.
)

### Description

abs_error is slightly loosen from 0.02 to 0.03 to allow test cases on
macOS arm64 to pass.
### Description
<!-- Describe your changes. -->
* Add Sum to op builder in QNN-EP
* Now we can limit the support to Sum with 2 inputs.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
* Enhance QNN-EP support for Sum with two inputs
@jatinwadhwa921 jatinwadhwa921 requested a review from ankitm3k March 20, 2025 15:15
@jatinwadhwa921 jatinwadhwa921 merged commit 2a24806 into ovep-develop Mar 21, 2025
6 of 11 checks passed
@jatinwadhwa921 jatinwadhwa921 deleted the sync_msft_20_3_25 branch April 15, 2025 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.