[Misc] Make timeout passable in init_distributed_environment #24522

jberkhahn · 2025-09-09T16:50:10Z

Purpose

vllm-spyre has been experiencing timeouts on this function in certain scenarios when forcing model compilation to happen serially with multiple backends and large context lengths. This function already has an option to set a timeout, it's just not passed here. This PR makes it configurable, but leaves it as the default (which is 30 minutes).

gemini-code-assist

Code Review

This pull request introduces a configurable timeout for init_distributed_environment, which is a useful addition for scenarios requiring longer initialization times. However, the current implementation has a potential issue where passing the default None value for the timeout to torch.distributed.init_process_group could lead to a runtime error. My review includes suggestions to fix this by conditionally passing the timeout argument, ensuring that the default PyTorch timeout is used when no specific timeout is provided.

gemini-code-assist · 2025-09-09T16:52:39Z

vllm/distributed/parallel_state.py

The type hint for timeout should be Optional[timedelta] to correctly reflect that None is a possible value. This improves code clarity and helps static type checkers.

Suggested change

timeout: timedelta = None,

):

timeout: Optional[timedelta] = None,

):

prashantgupta24 · 2025-09-09T16:56:59Z

vllm/distributed/parallel_state.py

Optional[timedelta] as already stated by gemini - otherwise lgtm

njhill

LGTM

Signed-off-by: jberkhahn <[email protected]>

…oject#24522) Signed-off-by: jberkhahn <[email protected]>

…oject#24522) Signed-off-by: jberkhahn <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

prashantgupta24 reviewed Sep 9, 2025

View reviewed changes

vllm/distributed/parallel_state.py Outdated

Copy link

Contributor

prashantgupta24 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[timedelta] as already stated by gemini - otherwise lgtm

jberkhahn force-pushed the timeout_fix branch from f39eb43 to 3048d38 Compare September 9, 2025 16:57

njhill approved these changes Sep 9, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025

jberkhahn force-pushed the timeout_fix branch from 3048d38 to 6574d10 Compare September 9, 2025 19:43

njhill changed the title ~~Make timeout passable in init_distributed_envionment~~ [Misc] Make timeout passable in init_distributed_environment Sep 9, 2025

jberkhahn force-pushed the timeout_fix branch 2 times, most recently from 1aea182 to b43d7bc Compare September 9, 2025 21:46

jberkhahn closed this Sep 10, 2025

jberkhahn force-pushed the timeout_fix branch from b43d7bc to e680723 Compare September 10, 2025 18:12

jberkhahn reopened this Sep 10, 2025

jberkhahn force-pushed the timeout_fix branch 2 times, most recently from abd5623 to e6d6c51 Compare September 10, 2025 20:27

Make timeout configurable in init_distributed_environment

64a8cc7

Signed-off-by: jberkhahn <[email protected]>

jberkhahn force-pushed the timeout_fix branch from e6d6c51 to 64a8cc7 Compare September 10, 2025 20:41

simon-mo approved these changes Sep 10, 2025

View reviewed changes

simon-mo merged commit cc99baf into vllm-project:main Sep 10, 2025
38 of 42 checks passed

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Misc] Make timeout passable in init_distributed_environment (vllm-pr…

3866b8e

…oject#24522) Signed-off-by: jberkhahn <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Misc] Make timeout passable in init_distributed_environment (vllm-pr…

6fc33ed

…oject#24522) Signed-off-by: jberkhahn <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Misc] Make timeout passable in init_distributed_environment (vllm-pr…

7ea7e6a

…oject#24522) Signed-off-by: jberkhahn <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Misc] Make timeout passable in init_distributed_environment (vllm-pr…

84266bf

…oject#24522) Signed-off-by: jberkhahn <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc] Make timeout passable in init_distributed_environment #24522

[Misc] Make timeout passable in init_distributed_environment #24522

Uh oh!

jberkhahn commented Sep 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 9, 2025

Uh oh!

prashantgupta24 Sep 9, 2025

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-    timeout: timedelta = None,
-):
+    timeout: Optional[timedelta] = None,
+):

Uh oh!

[Misc] Make timeout passable in init_distributed_environment #24522

[Misc] Make timeout passable in init_distributed_environment #24522

Uh oh!

Conversation

jberkhahn commented Sep 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

prashantgupta24 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jberkhahn commented Sep 9, 2025 •

edited by github-actions bot

Loading