Skip to content

Conversation

@kwankyu
Copy link
Collaborator

@kwankyu kwankyu commented Feb 7, 2025

As discussed in

https://groups.google.com/g/sage-devel/c/Vx_GkH3ita4

the main CI check for PRs frequently fails for issues whose solutions are unknown. This is degrading the quality of life for both author and reviewers alike.

There is already a mechanism to tackle this situation. We introduce "test failure baseline", explained in

https://doc-release--sagemath.netlify.app/html/en/developer/doctesting#auxiliary-files

to the CI check for PRs.

test: https://github.com/kwankyu/sage/actions/runs/13192916789/workflow

After this PR, the actual list of baseline test failures will be maintained in the eternal PR #39471 .

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

@kwankyu kwankyu changed the title Add baseline test failures Introduce test failure baseline to the main CI for PRs Feb 7, 2025
@github-actions
Copy link

github-actions bot commented Feb 7, 2025

Documentation preview for this PR (built with commit ec6fa93; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@user202729
Copy link
Contributor

user202729 commented Feb 7, 2025

So… generalization of .github/workflows/ci-conda-known-test-failures.json`? Looks like good idea.

Some considerations:

  • should we just merge the two files? (there's some false positive like we don't know why the freegb fails on Mac and nowhere else, but otherwise it's fine)
  • maybe generalize for ci-meson as well? I think meson is the "more modern" variant of conda (meson builds with ninja, conda builds with bootstrap → make)
  • is there a mechanism to check that the entry is removed after some PR fixes the test (is it even necessary? I think one advantage is if it pops up again we can point to the PR to say it doesn't work)

(thinking about it the classical build&test hard codes the random seed…? Isn't this counterproductive (hides randomness)? Maybe that's the actual reason why conda/meson seems to fail much more often than otherwise)

@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 7, 2025

So… generalization of .github/workflows/ci-conda-known-test-failures.json`?

Exactly.

But I wonder if anyone (@tobiasdiez?) is maintaining it as I plan to do in #39471

  • should we just merge the two files? (there's some false positive like we don't know why the freegb fails on Mac and nowhere else, but otherwise it's fine)
  • maybe generalize for ci-meson as well? I think meson is the "more modern" variant of conda (meson builds with ninja, conda builds with bootstrap → make)

Perhaps it is better to keep separate list for each testing environment: "build", "ci-conda", "ci-meson". Also separate maintenance PR for each of them, as #39471 is for "build".

  • is there a mechanism to check that the entry is removed after some PR fixes the test (is it even necessary? I think one advantage is if it pops up again we can point to the PR to say it doesn't work)

The PR author fixing the test may leave a comment to the PR maintaining the list. If the test failure seems gone (either fixed by a PR or by some unknown reason), the entry should be deleted. We may add again whenever it reappears.

(thinking about it the classical build&test hard codes the random seed…? Isn't this counterproductive (hides randomness)? Maybe that's the actual reason why conda/meson seems to fail much more often than otherwise)

No idea. Maybe. Better to search for the pr that introduced it or the author.

@kwankyu kwankyu changed the title Introduce test failure baseline to the main CI for PRs Introduce test failure baseline to the main CI check for PRs Feb 7, 2025
@tobiasdiez
Copy link
Contributor

tobiasdiez commented Feb 7, 2025

* maybe generalize for `ci-meson` as well? I think meson is the "more modern" variant of conda (meson builds with ninja, conda builds with bootstrap → make)

I'm not planning to add this for meson. The plan for meson is to move to pytest as the test runner, and pytest doesn't support such "baseline test failures".

* is there a mechanism to check that the entry is removed after some PR fixes the test (is it even necessary? I think one advantage is if it pops up again we can point to the PR to say it doesn't work)

No there is no such mechanism and this is the reason why the conda baseline is completely outdated.

I'm not convinced that this is a good approach due to the following lessons that we learned from the conda baseline:

  • As written above, the baseline tends to be outdated very quickly since people that fix an issue will not check if it's in some baseline or not.
  • The same tests still fail for people on their systems and thus reduce developer productivity (just check the release announcements or https://github.com/sagemath/sage/issues?q=is%3Aissue%20state%3Aopen%20random to get a feeling on how many tests are just failing randomly outside of ci)
  • There is no mentioning in the documentation that there are known issues with a given example.
  • (Pytest doesn't support baselines which makes it harder to migrate from our doctest runner to it; I think such a migration was/is also the plan for sage-the-distro)

My vote would be to mark the failing tests with known bug, but would be very open to other solutions that overcome these pain points.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 7, 2025

This is obvious, but for discussion, let me note that the purpose of the checks for PRs is to verify that the PR branch satisfies the basic requirements for merge, for authors and reviewers. Hence the results of the checks should depend only on the PR branch. As we all know, there are tests that (frequently) fail regardless of the PR branch. They betray the purpose of the checks on a PR.

Let us restrict our attention to only ordinary PRs that modify the sage library. For such a PR, it should be enough to pass "build" checks. I think "conda" and "meson" checks are irrelevant.

  • As written above, the baseline tends to be outdated very quickly since people that fix an issue will not check if it's in some baseline or not.

For "build" checks, there would be a few entries in the baseline, and they will be updated "live" through #39471.

  • The same tests still fail for people on their systems and thus reduce developer productivity
  • There is no mentioning in the documentation that there are known issues with a given example.

We don't know how the tests come to fail. There is not much to mention to users.

Anyway the purpose of the checks is to take care of PRs. Caring the failing tests is not a concern here. The intention is to ignore them in CI checks.

  • Pytest doesn't support baselines which makes it harder to migrate from our doctest runner to it; I think such a migration was/is also the plan for sage-the-distro

Such a migration, if ever happens, should still solve the issue of chronic test failures in whatever way, if not by baseline.

My vote would be to mark the failing tests with known bug ...

A test with "known bug" should always fail.

By the way, I guess that letting the chronic test failures in conda/meson environment fail PRs does not help us care or fix them. Then we should stop them from failing PRs. But this issue is out of the scope of the present PR.

@user202729
Copy link
Contributor

For such a PR, it should be enough to pass "build" checks. I think "conda" and "meson" checks are irrelevant.

For what it's worth it's possible for pull request authors to forget to update meson-related files, thus make the meson build fail (while the normal build still pass) and the pull request being technically incorrect

(though I agree that the random failure is highly annoying)

vbraun pushed a commit to vbraun/sage that referenced this pull request Mar 19, 2025
sagemathgh-39664: Add some 'not tested' marks to avoid CI failure
    
As in the title. I don't think there's any advantage in running the test
again.

There's only a very small risk of the fixer forget to delete the marker,
but it seems like a nonexistent issue (whichever pull request that fix
it should also remove the `# not tested`)

At least for those that doesn't segmentation fault or hang. (For those
who do the only solution I can think of is
sagemath#39539 )

Side note: not sure what's a good solution to this. Maybe we can do
sagemath#39470 instead? (but then it
doesn't apply to meson…)

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#39664
Reported by: user202729
Reviewer(s):
vbraun pushed a commit to vbraun/sage that referenced this pull request Mar 22, 2025
sagemathgh-39664: Add some 'not tested' marks to avoid CI failure
    
As in the title. I don't think there's any advantage in running the test
again.

There's only a very small risk of the fixer forget to delete the marker,
but it seems like a nonexistent issue (whichever pull request that fix
it should also remove the `# not tested`)

At least for those that doesn't segmentation fault or hang.

Side note: not sure what's a good solution to this. Maybe we can do
sagemath#39470 instead? (but then it
doesn't apply to meson…)

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#39664
Reported by: user202729
Reviewer(s):
@kwankyu kwankyu marked this pull request as draft August 31, 2025 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants