Skip to content

Conversation

@yf225
Copy link
Contributor

@yf225 yf225 commented Nov 8, 2025

Part of #477.

cc. @joydddd

@yf225 yf225 requested review from jansel and oulgen November 8, 2025 06:58
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 8, 2025
@yf225 yf225 changed the title Add distributed CI job and distributed example unit tests Add distributed CI job and example unit tests Nov 8, 2025
Copy link
Contributor

@oulgen oulgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failing tests

@yf225 yf225 force-pushed the dist_ci_job_and_example_tests branch 2 times, most recently from 426c29d to 683ef76 Compare November 8, 2025 07:37
"image": "nvidia/cuda:13.0.1-devel-ubuntu24.04",
"runtime-version": "cu130",
"container-options": "--gpus all",
"pytorch-version": "pytorch-nightly",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just use pinned pytorch for this? avoid building triton all together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think the pytorch symm-mem library is still being actively improved, and use pytorch nightly will ensure that we can iterate on symm-mem and helion in lock step to get the latest features

@yf225 yf225 changed the title Add distributed CI job and example unit tests Add distributed CI job (4xH100) and example unit tests Nov 8, 2025
@yf225
Copy link
Contributor Author

yf225 commented Nov 8, 2025

Will wait for #1107 to land first to have a clean CI.

@yf225 yf225 force-pushed the dist_ci_job_and_example_tests branch 2 times, most recently from 03dbe34 to 6427d2f Compare November 8, 2025 23:23
@yf225
Copy link
Contributor Author

yf225 commented Nov 9, 2025

Somehow the tests are being skipped in CI, still debugging it.

@yf225 yf225 force-pushed the dist_ci_job_and_example_tests branch 2 times, most recently from adaf8b7 to 63066b9 Compare November 9, 2025 05:59
@yf225 yf225 marked this pull request as draft November 9, 2025 07:33
@yf225 yf225 changed the title Add distributed CI job (4xH100) and example unit tests [WIP] Add distributed CI job (4xH100) and example unit tests Nov 9, 2025
@yf225 yf225 force-pushed the dist_ci_job_and_example_tests branch 9 times, most recently from f0072c4 to f24febc Compare November 9, 2025 19:51
@yf225 yf225 force-pushed the dist_ci_job_and_example_tests branch 2 times, most recently from 2cd19b4 to 7b9a4ec Compare November 9, 2025 21:30
@yf225 yf225 changed the title [WIP] Add distributed CI job (4xH100) and example unit tests Add distributed CI job (4xH100) and example unit tests Nov 9, 2025
@yf225 yf225 marked this pull request as ready for review November 9, 2025 22:07
@yf225 yf225 force-pushed the dist_ci_job_and_example_tests branch from 0942d02 to 26032dd Compare November 9, 2025 22:27
@yf225 yf225 merged commit 8a23df1 into main Nov 9, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants