Introduce test failure baseline to the main CI check for PRs #39470

kwankyu · 2025-02-07T03:08:21Z

As discussed in

https://groups.google.com/g/sage-devel/c/Vx_GkH3ita4

the main CI check for PRs frequently fails for issues whose solutions are unknown. This is degrading the quality of life for both author and reviewers alike.

There is already a mechanism to tackle this situation. We introduce "test failure baseline", explained in

https://doc-release--sagemath.netlify.app/html/en/developer/doctesting#auxiliary-files

to the CI check for PRs.

test: https://github.com/kwankyu/sage/actions/runs/13192916789/workflow

After this PR, the actual list of baseline test failures will be maintained in the eternal PR #39471 .

📝 Checklist

The title is concise and informative.
The description explains in detail what this PR is about.
I have linked a relevant issue or discussion.
I have created tests covering the changes.
I have updated the documentation and checked the documentation preview.

⌛ Dependencies

github-actions · 2025-02-07T03:57:23Z

Documentation preview for this PR (built with commit ec6fa93; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

user202729 · 2025-02-07T05:41:47Z

So… generalization of .github/workflows/ci-conda-known-test-failures.json`? Looks like good idea.

Some considerations:

should we just merge the two files? (there's some false positive like we don't know why the freegb fails on Mac and nowhere else, but otherwise it's fine)
maybe generalize for ci-meson as well? I think meson is the "more modern" variant of conda (meson builds with ninja, conda builds with bootstrap → make)
is there a mechanism to check that the entry is removed after some PR fixes the test (is it even necessary? I think one advantage is if it pops up again we can point to the PR to say it doesn't work)

(thinking about it the classical build&test hard codes the random seed…? Isn't this counterproductive (hides randomness)? Maybe that's the actual reason why conda/meson seems to fail much more often than otherwise)

kwankyu · 2025-02-07T06:18:49Z

So… generalization of .github/workflows/ci-conda-known-test-failures.json`?

Exactly.

But I wonder if anyone (@tobiasdiez?) is maintaining it as I plan to do in #39471

should we just merge the two files? (there's some false positive like we don't know why the freegb fails on Mac and nowhere else, but otherwise it's fine)

maybe generalize for ci-meson as well? I think meson is the "more modern" variant of conda (meson builds with ninja, conda builds with bootstrap → make)

Perhaps it is better to keep separate list for each testing environment: "build", "ci-conda", "ci-meson". Also separate maintenance PR for each of them, as #39471 is for "build".

is there a mechanism to check that the entry is removed after some PR fixes the test (is it even necessary? I think one advantage is if it pops up again we can point to the PR to say it doesn't work)

The PR author fixing the test may leave a comment to the PR maintaining the list. If the test failure seems gone (either fixed by a PR or by some unknown reason), the entry should be deleted. We may add again whenever it reappears.

(thinking about it the classical build&test hard codes the random seed…? Isn't this counterproductive (hides randomness)? Maybe that's the actual reason why conda/meson seems to fail much more often than otherwise)

No idea. Maybe. Better to search for the pr that introduced it or the author.

tobiasdiez · 2025-02-07T12:25:37Z

* maybe generalize for `ci-meson` as well? I think meson is the "more modern" variant of conda (meson builds with ninja, conda builds with bootstrap → make)

I'm not planning to add this for meson. The plan for meson is to move to pytest as the test runner, and pytest doesn't support such "baseline test failures".

* is there a mechanism to check that the entry is removed after some PR fixes the test (is it even necessary? I think one advantage is if it pops up again we can point to the PR to say it doesn't work)

No there is no such mechanism and this is the reason why the conda baseline is completely outdated.

I'm not convinced that this is a good approach due to the following lessons that we learned from the conda baseline:

As written above, the baseline tends to be outdated very quickly since people that fix an issue will not check if it's in some baseline or not.
The same tests still fail for people on their systems and thus reduce developer productivity (just check the release announcements or https://github.com/sagemath/sage/issues?q=is%3Aissue%20state%3Aopen%20random to get a feeling on how many tests are just failing randomly outside of ci)
There is no mentioning in the documentation that there are known issues with a given example.
(Pytest doesn't support baselines which makes it harder to migrate from our doctest runner to it; I think such a migration was/is also the plan for sage-the-distro)

My vote would be to mark the failing tests with known bug, but would be very open to other solutions that overcome these pain points.

kwankyu · 2025-02-07T16:23:18Z

This is obvious, but for discussion, let me note that the purpose of the checks for PRs is to verify that the PR branch satisfies the basic requirements for merge, for authors and reviewers. Hence the results of the checks should depend only on the PR branch. As we all know, there are tests that (frequently) fail regardless of the PR branch. They betray the purpose of the checks on a PR.

Let us restrict our attention to only ordinary PRs that modify the sage library. For such a PR, it should be enough to pass "build" checks. I think "conda" and "meson" checks are irrelevant.

As written above, the baseline tends to be outdated very quickly since people that fix an issue will not check if it's in some baseline or not.

For "build" checks, there would be a few entries in the baseline, and they will be updated "live" through #39471.

The same tests still fail for people on their systems and thus reduce developer productivity

There is no mentioning in the documentation that there are known issues with a given example.

We don't know how the tests come to fail. There is not much to mention to users.

Anyway the purpose of the checks is to take care of PRs. Caring the failing tests is not a concern here. The intention is to ignore them in CI checks.

Pytest doesn't support baselines which makes it harder to migrate from our doctest runner to it; I think such a migration was/is also the plan for sage-the-distro

Such a migration, if ever happens, should still solve the issue of chronic test failures in whatever way, if not by baseline.

My vote would be to mark the failing tests with known bug ...

A test with "known bug" should always fail.

By the way, I guess that letting the chronic test failures in conda/meson environment fail PRs does not help us care or fix them. Then we should stop them from failing PRs. But this issue is out of the scope of the present PR.

user202729 · 2025-02-08T00:46:21Z

For such a PR, it should be enough to pass "build" checks. I think "conda" and "meson" checks are irrelevant.

For what it's worth it's possible for pull request authors to forget to update meson-related files, thus make the meson build fail (while the normal build still pass) and the pull request being technically incorrect

(though I agree that the random failure is highly annoying)

sagemathgh-39664: Add some 'not tested' marks to avoid CI failure As in the title. I don't think there's any advantage in running the test again. There's only a very small risk of the fixer forget to delete the marker, but it seems like a nonexistent issue (whichever pull request that fix it should also remove the `# not tested`) At least for those that doesn't segmentation fault or hang. (For those who do the only solution I can think of is sagemath#39539 ) Side note: not sure what's a good solution to this. Maybe we can do sagemath#39470 instead? (but then it doesn't apply to meson…) ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#39664 Reported by: user202729 Reviewer(s):

sagemathgh-39664: Add some 'not tested' marks to avoid CI failure As in the title. I don't think there's any advantage in running the test again. There's only a very small risk of the fixer forget to delete the marker, but it seems like a nonexistent issue (whichever pull request that fix it should also remove the `# not tested`) At least for those that doesn't segmentation fault or hang. Side note: not sure what's a good solution to this. Maybe we can do sagemath#39470 instead? (but then it doesn't apply to meson…) ### 📝 Checklist  - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies    URL: sagemath#39664 Reported by: user202729 Reviewer(s):

Add baseline test failures

ec6fa93

kwankyu changed the title ~~Add baseline test failures~~ Introduce test failure baseline to the main CI for PRs Feb 7, 2025

kwankyu marked this pull request as ready for review February 7, 2025 04:17

github-actions bot added the s: needs review label Feb 7, 2025

This was referenced Feb 7, 2025

Random test failure in sage.rings.polynomial.polynomial_element #39460

Open

Baseline test failures for "Build & Test / test-long" #39471

Draft

kwankyu changed the title ~~Introduce test failure baseline to the main CI for PRs~~ Introduce test failure baseline to the main CI check for PRs Feb 7, 2025

user202729 mentioned this pull request Mar 10, 2025

Add some 'not tested' marks to avoid CI failure #39664

Merged

5 tasks

kwankyu marked this pull request as draft August 31, 2025 03:34

github-actions bot removed the s: needs review label Aug 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Introduce test failure baseline to the main CI check for PRs #39470

Introduce test failure baseline to the main CI check for PRs #39470

Uh oh!

kwankyu commented Feb 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 7, 2025

Uh oh!

user202729 commented Feb 7, 2025 •

edited

Loading

Uh oh!

kwankyu commented Feb 7, 2025

Uh oh!

tobiasdiez commented Feb 7, 2025 •

edited

Loading

Uh oh!

kwankyu commented Feb 7, 2025

Uh oh!

user202729 commented Feb 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Introduce test failure baseline to the main CI check for PRs #39470

Are you sure you want to change the base?

Introduce test failure baseline to the main CI check for PRs #39470

Uh oh!

Conversation

kwankyu commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Checklist

⌛ Dependencies

Uh oh!

github-actions bot commented Feb 7, 2025

Uh oh!

user202729 commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwankyu commented Feb 7, 2025

Uh oh!

tobiasdiez commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwankyu commented Feb 7, 2025

Uh oh!

user202729 commented Feb 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kwankyu commented Feb 7, 2025 •

edited

Loading

user202729 commented Feb 7, 2025 •

edited

Loading

tobiasdiez commented Feb 7, 2025 •

edited

Loading