Skip to content

[🐛 CI]: Jobs are not getting cancelled properly #13483

@titusfortner

Description

@titusfortner

What happened?

Putting this in an issue since there is a lot of info and I need help. 😄

The original CI code was:

name: CI

on:
  pull_request:
  push:
    branches:
      - trunk
  schedule:
    - cron: "0 */12 * * *"
  workflow_dispatch:

  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.event_name == 'pull_request' }}

For some reason, this job that was manually triggered (with "workflow_dispatch") got the message: Canceling since a higher priority waiting request for 'CI-refs/heads/trunk' exists based on this job which was triggered by a push.

Since I wanted to run everything on trunk at that time, and the new job only ran JS tests based on our Bazel check job, I tried to redo the concurrency section.

The idea was to differentiate the groups between something that tested everything ("workflow_dispatch" and "schedule") from the ones that do not necessarily do that ("pull_request" and "push") by appending the string "-all" for the first two. Doing this required using a "fake ternary". The current code is:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}${{ github.event_name == 'workflow_dispatch' && '-all' || '' }}${{ github.event_name == 'schedule' && '-all' || '' }}
  cancel-in-progress: true

except now I'm running into the issue where this job that tests just the Python tests gets canceled because of this job that tests just the Ruby tests.

Looking at this again, I realize that the cancel-in-progress: ${{ github.event_name == 'pull_request' }} should have prevented the original issue I had, so now I'm a little stuck on what this code should be.

The goal:

  • Cancel jobs that re-run tests that have changed in the same branch with a subsequent job
  • Do not cancel jobs that are running tests that might not get run in a subsequent job

If we can trust bazel to cache properly, then maybe we don't need to worry about running more jobs and we shouldn't cancel things in progress? Or is caching not good enough? (there have been periods of active development where the CI was backed up about 4 hours which led me to believe we were redoing a lot of tests)

@p0deje / @diemol Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    B-buildIncludes scripting, bazel and CI integrationsI-defectSomething is not working as intended

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions