-
Notifications
You must be signed in to change notification settings - Fork 66
Add distributed CI job (4xH100) and example unit tests #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
oulgen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failing tests
426c29d to
683ef76
Compare
| "image": "nvidia/cuda:13.0.1-devel-ubuntu24.04", | ||
| "runtime-version": "cu130", | ||
| "container-options": "--gpus all", | ||
| "pytorch-version": "pytorch-nightly", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just use pinned pytorch for this? avoid building triton all together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think the pytorch symm-mem library is still being actively improved, and use pytorch nightly will ensure that we can iterate on symm-mem and helion in lock step to get the latest features
|
Will wait for #1107 to land first to have a clean CI. |
03dbe34 to
6427d2f
Compare
|
Somehow the tests are being skipped in CI, still debugging it. |
adaf8b7 to
63066b9
Compare
f0072c4 to
f24febc
Compare
2cd19b4 to
7b9a4ec
Compare
This reverts commit 63066b9.
0942d02 to
26032dd
Compare
This reverts commit 26032dd.
Part of #477.
cc. @joydddd