Skip to content

Commit 8642e2d

Browse files
Vivek Miglanifacebook-github-bot
authored andcommitted
Add auto retries for Captum OSS GitHub Actions
Summary: We frequently see sporadic failures in Captum GitHub actions test workflows, often related to package download, http errors, conda environment setup, etc. We add auto-retries to automatically retry failed workflows rather than needing to do this manually. Differential Revision: D64693773
1 parent ed5daa3 commit 8642e2d

File tree

5 files changed

+99
-24
lines changed

5 files changed

+99
-24
lines changed

.github/workflows/retry.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
name: Retry Test
2+
on:
3+
workflow_dispatch:
4+
inputs:
5+
run_id:
6+
required: true
7+
jobs:
8+
rerun-on-failure:
9+
permissions: write-all
10+
runs-on: ubuntu-latest
11+
steps:
12+
- name: rerun ${{ inputs.run_id }}
13+
env:
14+
GH_REPO: ${{ github.repository }}
15+
GH_TOKEN: ${{ github.token }}
16+
GH_DEBUG: api
17+
run: |
18+
gh run watch ${{ inputs.run_id }} > /dev/null 2>&1
19+
gh run rerun ${{ inputs.run_id }} --failed
Lines changed: 38 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,48 @@
11
name: Unit-tests for Conda install
22

33
on:
4-
pull_request:
5-
push:
6-
branches:
7-
- master
4+
pull_request:
5+
push:
6+
branches:
7+
- master
88

9-
workflow_dispatch:
9+
workflow_dispatch:
1010

1111
env:
12-
CHANNEL: "nightly"
12+
CHANNEL: "nightly"
1313

1414
jobs:
15-
tests:
16-
strategy:
17-
matrix:
18-
python_version: ["3.8", "3.9", "3.10", "3.11"]
19-
fail-fast: false
20-
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
21-
with:
22-
runner: linux.12xlarge
23-
repository: pytorch/captum
24-
script: |
25-
# Set up Environment Variables
26-
export PYTHON_VERSION="${{ matrix.python_version }}"
15+
tests:
16+
strategy:
17+
matrix:
18+
python_version: ["3.8", "3.9", "3.10", "3.11"]
19+
fail-fast: false
20+
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
21+
with:
22+
runner: linux.12xlarge
23+
repository: pytorch/captum
24+
script: |
25+
# Set up Environment Variables
26+
export PYTHON_VERSION="${{ matrix.python_version }}"
2727
28-
# Create Conda Env
29-
conda create -yp ci_env python="${PYTHON_VERSION}"
30-
conda activate /pytorch/captum/ci_env
31-
./scripts/install_via_conda.sh -n
28+
# Create Conda Env
29+
conda create -yp ci_env python="${PYTHON_VERSION}"
30+
conda activate /pytorch/captum/ci_env
31+
./scripts/install_via_conda.sh -n
3232
33-
# Run Tests
34-
python3 -m pytest -ra --cov=. --cov-report term-missing
33+
# Run Tests
34+
python3 -m pytest -ra --cov=. --cov-report term-missing
35+
36+
auto-retry:
37+
name: Auto retry on failure
38+
if: failure() && fromJSON(github.run_attempt) < 2
39+
runs-on: ubuntu-latest
40+
steps:
41+
- name: Start rerun workflow
42+
env:
43+
GH_REPO: ${{ github.repository }}
44+
GH_TOKEN: ${{ github.token }}
45+
GH_DEBUG: api
46+
run: |
47+
gh workflow run retry_build.yml \
48+
-F run_id=${{ github.run_id }}

.github/workflows/test-pip-cpu-with-mypy.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,17 @@ jobs:
2525
./scripts/run_mypy.sh
2626
# Run Tests
2727
python3 -m pytest -ra --cov=. --cov-report term-missing
28+
29+
auto-retry:
30+
name: Auto retry on failure
31+
if: failure() && fromJSON(github.run_attempt) < 2
32+
runs-on: ubuntu-latest
33+
steps:
34+
- name: Start rerun workflow
35+
env:
36+
GH_REPO: ${{ github.repository }}
37+
GH_TOKEN: ${{ github.token }}
38+
GH_DEBUG: api
39+
run: |
40+
gh workflow run retry_build.yml \
41+
-F run_id=${{ github.run_id }}

.github/workflows/test-pip-cpu.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,17 @@ jobs:
3535
./scripts/install_via_pip.sh ${{ matrix.pytorch_args }} ${{ matrix.transformers_args }}
3636
# Run Tests
3737
python3 -m pytest -ra --cov=. --cov-report term-missing
38+
39+
auto-retry:
40+
name: Auto retry on failure
41+
if: failure() && fromJSON(github.run_attempt) < 2
42+
runs-on: ubuntu-latest
43+
steps:
44+
- name: Start rerun workflow
45+
env:
46+
GH_REPO: ${{ github.repository }}
47+
GH_TOKEN: ${{ github.token }}
48+
GH_DEBUG: api
49+
run: |
50+
gh workflow run retry_build.yml \
51+
-F run_id=${{ github.run_id }}

.github/workflows/test-pip-gpu.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,17 @@ jobs:
3030
3131
# Run Tests
3232
python3 -m pytest -ra --cov=. --cov-report term-missing
33+
34+
auto-retry:
35+
name: Auto retry on failure
36+
if: failure() && fromJSON(github.run_attempt) < 2
37+
runs-on: ubuntu-latest
38+
steps:
39+
- name: Start rerun workflow
40+
env:
41+
GH_REPO: ${{ github.repository }}
42+
GH_TOKEN: ${{ github.token }}
43+
GH_DEBUG: api
44+
run: |
45+
gh workflow run retry_build.yml \
46+
-F run_id=${{ github.run_id }}

0 commit comments

Comments
 (0)