Skip to content

[Release2.6][SWDEV-479939] Added sleep statement for Navi archs for vectorized operation on CPU #2388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: release/2.6
Choose a base branch
from

Conversation

akashveramd
Copy link

@akashveramd akashveramd commented Jul 19, 2025

Created this PR to fix test_sharded_grad_scaler_found_inf failing test in Jira ticket https://ontrack-internal.amd.com/browse/SWDEV-479939.

The test was failing for Navi arch, that too when the test runs with cpu_offload=true. The cpu_offload=true, results in running the grad scalar optimizer to run on CPU. The grad scalar optimizer uses vectorized & scalar operations to find inf values in tensors. It seems for Navi arch, the vectorized operation is running unreliably, perhaps taking longer time to execute and resulting in failure. Adding a sleep statement in the vectorized operation helps run it successfully.

@akashveramd akashveramd self-assigned this Jul 19, 2025
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 19, 2025

Jenkins build for 72dc0f072a8a9007122a8747a770a722a7838d4e commit finished as NOT_BUILT
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 19, 2025

Jenkins build for 72dc0f072a8a9007122a8747a770a722a7838d4e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant