Skip to content

Adding one B200 runner (8xB200 GPUs) to PyTorch CI #6869

@nWEIdia

Description

@nWEIdia

Following the steps described in https://github.com/pytorch/test-infra/blob/main/docs/partners_pytorch_ci_runners.md, I'm creating this issue as a support ticket to add B200 runner to pytorch CI.

Current blockers include:

  1. After reboot, the runner encountered "Runner connect error: Registration was not found or is not medium trust. ClientType: . Retrying until reconnected."
  2. the jobs are encountering "aws command not found" issues.

cc @atalman @malfet @seemethere @ptrblck @tinglvv @eqy

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions