Skip to content

CI: Install latest compute-sanitizer separately from CTK #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 3, 2025

Conversation

carterbox
Copy link
Contributor

@carterbox carterbox commented Apr 30, 2025

Description

We want to always use the latest compute-sanitizer because there could be bug fixes for the compute-sanitizer, and it is OK to use the newest compute-sanitizer on programs running with old CTK versions.

This PR reverts #593

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented Apr 30, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@carterbox
Copy link
Contributor Author

/ok to test 3f07930

This comment has been minimized.

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carterbox
Copy link
Contributor Author

/ok to test 9aa44f9

@leofang leofang added enhancement Any code-related improvements P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 30, 2025
@leofang leofang marked this pull request as draft May 1, 2025 02:44
Copy link
Contributor

copy-pr-bot bot commented May 1, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@carterbox
Copy link
Contributor Author

This line needs to become mkdir -p to allow reusing the same folder https://github.com/NVIDIA/cuda-python/pull/594/files#diff-c3ca06d8e9146d2421ea02333d7a174f5a0c195d661e9327a5eb1b30790f6decL54

Actually, we don't want to use the same folder. We want to ensure the folder is clean so that we only get the components that we asked for?

@carterbox
Copy link
Contributor Author

/ok to test 6b02f1a

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carterbox sorry for late reply, I have a horrible schedule today. I suggest to not waste time on this PR and merge your other PR instead. This PR would not work without significant overhaul as explained in the offline chat. We’re blowing up the GitHub Cache space because the second invocation of fetch_ctk is separately cached, with the mini ctk from the previous step + sanitizer.

@leofang leofang closed this in #593 May 1, 2025
@leofang
Copy link
Member

leofang commented May 1, 2025

hmmm I dunno why this PR was closed, I feel we can repurpose it if we can fix the caching issue, Daniel explained the latest changes to me offline (which I overlooked, sorry!)

@leofang leofang reopened this May 1, 2025
…nitizer"

This reverts commit bd770e1, reversing
changes made to 19df0d9.
@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test 287302b

@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test 1d9749e

@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test c7f9982

@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test b9fabb6

@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test 0451990

@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test 69b6ee1

@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test b2c6de3

@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test 5376e79

@leofang leofang closed this May 2, 2025
@leofang leofang reopened this May 2, 2025
@leofang
Copy link
Member

leofang commented May 2, 2025

/ok to test 5376e79

@leofang leofang marked this pull request as ready for review May 2, 2025 23:00
@leofang leofang requested a review from rwgk May 2, 2025 23:04
rwgk
rwgk previously approved these changes May 2, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome that you succeeded in wrestling this down 🚀

@leofang leofang merged commit 90f9269 into NVIDIA:main May 3, 2025
1 check passed
@leofang leofang added this to the cuda-python 12.9.0 & 11.8.7 milestone May 3, 2025
@leofang
Copy link
Member

leofang commented May 3, 2025

Thanks, @carterbox and @rwgk! This should make it look a lot nicer and robust now.

With this PR, each invocation of fetch_ctk is separately cached and properly overlayed in the same filesystem tree (assuming there's no conflicts in the requested components), so we won't blow up the cache size or bundle a 12.9 sanitizer inside a 12.8 mini ctk (which is a bit nerve wrecking). It is still unclear to me why a past state of this PR could lead to failures in the Cython compilation test. I can only assume that we somehow messed up with the file layout.

The biggest challenge is still that fetch_ctk is a universal cross-platform action, despite its innocent looking of being a Bash script-based action. And because we have a vastly different set of runner images (GH-hosted Linux/Windows, with/without GPU, and self-hosted Linux/Windows runners), we are forced to use the lowest common denominator of Bash commands and tools that are available everywhere. Lack of rsync is just one such case.

Copy link

github-actions bot commented May 3, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

@carterbox carterbox deleted the dching/latest-sanitizer-alt branch May 5, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD CI/CD infrastructure enhancement Any code-related improvements P1 Medium priority - Should do
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants