-
Notifications
You must be signed in to change notification settings - Fork 277
Run KNOWNBUG and THOROUGH regression tests in CI #5958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #5958 +/- ##
===========================================
+ Coverage 74.52% 75.11% +0.59%
===========================================
Files 1447 1447
Lines 157808 157807 -1
===========================================
+ Hits 117610 118543 +933
+ Misses 40198 39264 -934
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice idea. Is it worth running this once first and PRing any of the KNOWNBUG tests that now work? My impression is there are probably going to be some and they may need attention to figure out if they are really working on just incorrect tests.
I'm not a huge fan of repurposing existing jobs to be honest. Would it be possible that this gets done as a separate job? Otherwise I agree that this is an excellent idea. |
c154554
to
fcf9195
Compare
I guess I'm still living the times of limited Travis runners :-) That's fixed now, added two new jobs. |
Belated response: yes, I'll try to |
e257d69
to
aa6921d
Compare
8912f94
to
d303849
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the KNOWNBUG checking to routinely check for improvements/changes. The THOROUGH on all PRs seems like it may be a lot of extra load?
I'm not sure we care? "CBMC / codecov-coverage-report" takes longer (102 vs 75 minutes), and GitHub happily runs all of the in parallel (up to 20 - according to https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits). |
d303849
to
ab0071c
Compare
20ece87
to
0bbacb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly this PR looks good, but I would prefer that other fixes/changes that are not simply a CI change be a separate PR. For example 0084739 appears to be bug fix that was found along the way (and great to have done), but not part of the titular change.
Fair point, I've factored those out into #6102, #6103, #6104. The remaining commits could perhaps be factored out as well, but wouldn't be subject to CI-based testing until last commit of this PR is merged (which, in turn, can only be merged if fixing those issues first). |
These had been marked as THOROUGH without fully checking their status.
The assertion of interest now has number 3, which is irrelevant and should not cause the test to fail.
The block includes lines 10 and 11.
The reference implementation and the built-in function disagree in case the index is out of bounds.
Several tests spuriously failed as they were lacking the Java models library. Some of them also do not need to be marked as "THOROUGH" for they terminate in under 1 second. One test moved from THOROUGH to KNOWNBUG as the assertions are (unexpectedly) deemed unreachable.
Copy the check-ubuntu-20_04-cmake-gcc job to run KNOWNBUG (any test reported as failure will tell us that a bug has unexpectedly been fixed) an THOROUGH (tests that are expected to pass, but take longer to do so) tests, each as a separate GitHub action. Also run all tests tagged broken-smt-backend to confirm they haven't been fixed.
@TGWDB All dependencies are now merged and this PR is rebased and should be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this personally, provided we don't mark the extra CI jobs as required - they shouldn't be blocking PRs in my opinion.
Use the check-ubuntu-20_04-cmake-gcc job to run KNOWNBUG (any test
reported as failure will tell us that a bug has unexpectedly been fixed)
an THOROUGH (tests that are expected to pass, but take longer to do so)
tests.