Skip to content

Conversation

@huydhn
Copy link
Contributor

@huydhn huydhn commented Jan 13, 2024

To run the upload part in a separate upload job on GH ephemeral runners, we need:

  1. Specific artifact name for each binary, so the upload job could find the correct one.
  2. Create a new GHA setup-binary-upload to:
    1. Download the artifacts from GitHub
    2. Running pkg-helpers is needed to figure out the correct S3 bucket and path to upload to.
  3. Create a new GHA reusable workflow _binary_upload to upload the artifacts to S3.
    1. Run on GH ephemeral runner ubuntu-22.04.
    2. Only this job has access to the credential, the build job doesn't have that privilege anymore.

A small caveat here is that the upload job will depend on the build job with all its configuration matrix, so it can only be run after all build configurations finish successfully, not when individual builds finish.

The PR is quite big, so I will do a similar follow up for conda build after this using the same _binary_upload reusable workflow.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2024
@vercel
Copy link

vercel bot commented Jan 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Jan 13, 2024 8:57am

@huydhn huydhn changed the title Refactor binary upload job to a separate job running on GH ephemeral runner Refactor wheel upload job to a separate job running on GH ephemeral runner Jan 13, 2024
@huydhn huydhn requested review from atalman and malfet January 13, 2024 09:06
@huydhn huydhn marked this pull request as ready for review January 13, 2024 09:06
@huydhn huydhn merged commit 8acbaa9 into main Jan 15, 2024
huydhn added a commit that referenced this pull request Jan 15, 2024
I made a mistake in #4877 when
not explicitly select `test-infra` as the repo to checkout for the GHA.
When running on domains, the default repos are domains, i.e.
`pytorch/vision`, and they don't have the GHA we need, namely
`setup-binary-upload`.

An example failure when testing on vision nightly
https://github.com/pytorch/vision/actions/runs/7533535440/job/20506343331
malfet pushed a commit that referenced this pull request Jan 16, 2024
…unner (#4886)

Similar to #4877, this moves
conda upload into a separate job on GH ephemeral runner:

* I need a new `_binary_conda_upload` reusable workflow because conda
upload uses anaconda client to upload to conda, not awscli to upload to
S3.
* The build job doesn't have access to `pytorchbot-env` anymore, thus it
has no access to `CONDA_PYTORCHBOT_TOKEN` and
`CONDA_PYTORCHBOT_TOKEN_TEST` secrets. Only the upload job has this
access.
huydhn added a commit that referenced this pull request Feb 12, 2024
…unner (#4877)

To run the upload part in a separate upload job on GH ephemeral runners,
we need:

1. Specific artifact name for each binary, so the upload job could find
the correct one.
2. Create a new GHA `setup-binary-upload` to:
    1. Download the artifacts from GitHub 
2. Running `pkg-helpers` is needed to figure out the correct S3 bucket
and path to upload to.
3. Create a new GHA reusable workflow `_binary_upload` to upload the
artifacts to S3.
    1. Run on GH ephemeral runner `ubuntu-22.04`.
2. Only this job has access to the credential, the build job doesn't
have that privilege anymore.

A small caveat here is that the upload job will depend on the build job
with all its configuration matrix, so it can only be run after all build
configurations finish successfully, not when individual builds finish.

The PR is quite big, so I will do a similar follow up for conda build
after this using the same `_binary_upload` reusable workflow.
huydhn added a commit that referenced this pull request Feb 12, 2024
I made a mistake in #4877 when
not explicitly select `test-infra` as the repo to checkout for the GHA.
When running on domains, the default repos are domains, i.e.
`pytorch/vision`, and they don't have the GHA we need, namely
`setup-binary-upload`.

An example failure when testing on vision nightly
https://github.com/pytorch/vision/actions/runs/7533535440/job/20506343331
huydhn added a commit that referenced this pull request Feb 12, 2024
…unner (#4886)

Similar to #4877, this moves
conda upload into a separate job on GH ephemeral runner:

* I need a new `_binary_conda_upload` reusable workflow because conda
upload uses anaconda client to upload to conda, not awscli to upload to
S3.
* The build job doesn't have access to `pytorchbot-env` anymore, thus it
has no access to `CONDA_PYTORCHBOT_TOKEN` and
`CONDA_PYTORCHBOT_TOKEN_TEST` secrets. Only the upload job has this
access.
huydhn added a commit that referenced this pull request Feb 12, 2024
The list includes:

* #4870
* #4877
* #4882
* #4886
* #4891
* #4893
* #4894
* #4901

---------

Co-authored-by: Andrey Talman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants