Skip to content

ORT Debug Build Fails to Infer for Every Model #24535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jatinwadhwa921 opened this issue Apr 24, 2025 · 2 comments · Fixed by #24542
Open

ORT Debug Build Fails to Infer for Every Model #24535

jatinwadhwa921 opened this issue Apr 24, 2025 · 2 comments · Fixed by #24542

Comments

@jatinwadhwa921
Copy link
Contributor

Describe the issue

When building ONNX Runtime in Debug configuration, inference consistently fails for all models tested (e.g., SqueezeNet, AlexNet). The same models run successfully with a Release build. This issue appears to be isolated to the Debug build configuration and is reproducible across different models.

The inference attempt in Debug mode results in the following error (attached below), indicating a possible issue in debug-specific assertions, memory checks, or internal state validations.
Image

To reproduce

  1. git clone --recursive https://github.com/microsoft/onnxruntime
  2. cd onnxruntime
  3. build.bat --config Debug --build_shared_lib --parallel
  4. cd build/Windows/Debug/Debug
  5. onnxruntime_perf_test.exe -v -m times -r 1 -I "C:\Users\Administrator\Downloads\squeezenet1.1-7.onnx"

Urgency

No response

Platform

Windows

OS Version

Windows

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

main

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@snnn
Copy link
Member

snnn commented Apr 24, 2025

I tried the latest code, and ran onnx_test_runner and onnxruntime_perf_test with a resnet50 model. They all didn't find any issue.

onnxruntime_perf_test.exe -t 10 C:\a\resnet50-v1-12\resnet50-v1-12.onnx
Session creation time cost: 2.43358 s
First inference time cost: 18 ms
Total inference time cost: 10.0107 s
Total inference requests: 683
Average inference time cost: 14.6569 ms
Total inference run time: 10.0175 s
Number of inferences per second: 68.1808
Avg CPU usage: 49 %
Peak working set size: 302383104 bytes
Avg CPU usage:49
Peak working set size:302383104
Runs:683
Min Latency: 0.0101027 s
Max Latency: 0.025222 s
P50 Latency: 0.0143986 s
P90 Latency: 0.0185777 s
P95 Latency: 0.0193943 s
P99 Latency: 0.0221765 s
P999 Latency: 0.025222 s
memleakdbg:
----- No memory leaks detected -----

The model was downloaded from onnx model zoo.

Here is my build command:

python tools\ci_build\build.py  --config Debug --build_dir b1575555 --skip_submodule_sync --build_csharp --update --build --parallel --cmake_generator "Visual Studio 17 2022" --build_shared_lib  --build_wheel --msvc_toolset 14.42 --use_binskim_compliant_compile_flags --use_vcpkg

edgchen1 added a commit that referenced this issue Apr 25, 2025
### Description
<!-- Describe your changes. -->

Fix memleakdbg call stack output.

The call stack output was getting clobbered:

`C:\dev\onnxruntime\build\Debug\_deps\googletest-src\googletest\include\gtest\internal\gtest-port.h(1631):
l\gtest-port.h(1631): eadLocal<testing::Sequence *>::GetOrCreateValue`

I think the issue is that this aliasing of `buffer` and `symbol`:

https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L97-L100

does not play nicely with a call to `_snprintf_s` like this:

https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L115

The clobbered output does not match the predefined, ignored patterns, so
we see spurious mem leak check output.

This change updates the memleakdbg output generation to use C++ ostreams
and instead of fixed size buffers and `_snprintf_s`.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix spurious mem leak check output.
Fix #24535.
@snnn snnn reopened this Apr 25, 2025
@snnn
Copy link
Member

snnn commented Apr 25, 2025

The screenshot was from onnxruntime_perf_test. @jatinwadhwa921 , do you have more details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants