Skip to content

add rdzv_backend parameter to DDPJobDefinition #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 11, 2023

Conversation

MichaelClifford
Copy link
Collaborator

This change allows the user to define the rdzv_backend to be used when creating a DDPJobDefinition.

This change is required because currently, the backend is hard coded to "static" in torchx dist.py. Howver, depending on whether or not we are using the Ray Scheduler or the MCAD scheduler, the required backend needs to change as well.

This PR will be paired with an upcoming change to torchx/OCP that accounts for the different backends submitted here.

@MichaelClifford MichaelClifford force-pushed the rdzv_backend branch 2 times, most recently from 4f57836 to c5d9899 Compare April 10, 2023 20:13
@anishasthana
Copy link
Contributor

@MichaelClifford Can you rebase on main?

@MichaelClifford
Copy link
Collaborator Author

@anishasthana rebased :)

Copy link
Collaborator

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Maxusmusti Maxusmusti merged commit c7e2bf1 into project-codeflare:main Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants