Skip to content

Run KNOWNBUG and THOROUGH regression tests in CI #5958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 12, 2021

Conversation

tautschnig
Copy link
Collaborator

Use the check-ubuntu-20_04-cmake-gcc job to run KNOWNBUG (any test
reported as failure will tell us that a bug has unexpectedly been fixed)
an THOROUGH (tests that are expected to pass, but take longer to do so)
tests.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Mar 19, 2021
@codecov
Copy link

codecov bot commented Mar 19, 2021

Codecov Report

Merging #5958 (882c670) into develop (4c14789) will increase coverage by 0.59%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5958      +/-   ##
===========================================
+ Coverage    74.52%   75.11%   +0.59%     
===========================================
  Files         1447     1447              
  Lines       157808   157807       -1     
===========================================
+ Hits        117610   118543     +933     
+ Misses       40198    39264     -934     
Impacted Files Coverage Δ
src/solvers/smt2/smt2_conv.cpp 60.63% <0.00%> (+0.22%) ⬆️
src/solvers/lowering/byte_operators.cpp 92.16% <0.00%> (+0.36%) ⬆️
src/ansi-c/c_typecast.cpp 79.00% <0.00%> (+0.55%) ⬆️
src/goto-programs/goto_program.h 91.63% <0.00%> (+0.64%) ⬆️
src/goto-instrument/wmm/cycle_collection.cpp 88.04% <0.00%> (+3.26%) ⬆️
src/goto-instrument/rw_set.h 47.27% <0.00%> (+5.45%) ⬆️
src/goto-instrument/wmm/event_graph.h 71.29% <0.00%> (+6.48%) ⬆️
src/goto-instrument/wmm/goto2graph.cpp 54.41% <0.00%> (+9.11%) ⬆️
src/goto-instrument/rw_set.cpp 52.68% <0.00%> (+10.75%) ⬆️
src/solvers/flattening/boolbv_index.cpp 89.72% <0.00%> (+10.77%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 865d3b6...882c670. Read the comment docs.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice idea. Is it worth running this once first and PRing any of the KNOWNBUG tests that now work? My impression is there are probably going to be some and they may need attention to figure out if they are really working on just incorrect tests.

@NlightNFotis
Copy link
Contributor

NlightNFotis commented Mar 22, 2021

I'm not a huge fan of repurposing existing jobs to be honest.

Would it be possible that this gets done as a separate job?

Otherwise I agree that this is an excellent idea.

@tautschnig
Copy link
Collaborator Author

I'm not a huge fan of repurposing existing jobs to be honest.

Would it be possible that this gets done as a separate job?

Otherwise I agree that this is an excellent idea.

I guess I'm still living the times of limited Travis runners :-) That's fixed now, added two new jobs.

@tautschnig
Copy link
Collaborator Author

This is a nice idea. Is it worth running this once first and PRing any of the KNOWNBUG tests that now work? My impression is there are probably going to be some and they may need attention to figure out if they are really working on just incorrect tests.

Belated response: yes, I'll try to git bisect to figure out when exactly tests were fixed. Will keep this PR in "Draft" state for the time being.

@tautschnig tautschnig force-pushed the test-thorough branch 5 times, most recently from e257d69 to aa6921d Compare April 1, 2021 19:20
@tautschnig tautschnig force-pushed the test-thorough branch 4 times, most recently from 8912f94 to d303849 Compare April 2, 2021 13:55
@tautschnig tautschnig marked this pull request as ready for review April 2, 2021 16:55
@tautschnig tautschnig requested a review from a team as a code owner April 2, 2021 16:55
@tautschnig tautschnig assigned NlightNFotis and TGWDB and unassigned tautschnig Apr 2, 2021
Copy link
Contributor

@TGWDB TGWDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of the KNOWNBUG checking to routinely check for improvements/changes. The THOROUGH on all PRs seems like it may be a lot of extra load?

@tautschnig
Copy link
Collaborator Author

I like the idea of the KNOWNBUG checking to routinely check for improvements/changes. The THOROUGH on all PRs seems like it may be a lot of extra load?

I'm not sure we care? "CBMC / codecov-coverage-report" takes longer (102 vs 75 minutes), and GitHub happily runs all of the in parallel (up to 20 - according to https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits).

@tautschnig tautschnig self-assigned this May 6, 2021
@tautschnig tautschnig force-pushed the test-thorough branch 3 times, most recently from 20ece87 to 0bbacb1 Compare May 9, 2021 09:05
Copy link
Contributor

@TGWDB TGWDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly this PR looks good, but I would prefer that other fixes/changes that are not simply a CI change be a separate PR. For example 0084739 appears to be bug fix that was found along the way (and great to have done), but not part of the titular change.

@tautschnig
Copy link
Collaborator Author

Broadly this PR looks good, but I would prefer that other fixes/changes that are not simply a CI change be a separate PR. For example 0084739 appears to be bug fix that was found along the way (and great to have done), but not part of the titular change.

Fair point, I've factored those out into #6102, #6103, #6104. The remaining commits could perhaps be factored out as well, but wouldn't be subject to CI-based testing until last commit of this PR is merged (which, in turn, can only be merged if fixing those issues first).

@tautschnig tautschnig changed the title Run KNOWNBUG and THOROUGH regression tests in CI Run KNOWNBUG and THOROUGH regression tests in CI [depends-on: #6102, #6103, #6104] May 10, 2021
These had been marked as THOROUGH without fully checking their status.
The assertion of interest now has number 3, which is irrelevant and
should not cause the test to fail.
The reference implementation and the built-in function disagree in case
the index is out of bounds.
Several tests spuriously failed as they were lacking the Java models
library. Some of them also do not need to be marked as "THOROUGH" for
they terminate in under 1 second. One test moved from THOROUGH to
KNOWNBUG as the assertions are (unexpectedly) deemed unreachable.
Copy the check-ubuntu-20_04-cmake-gcc job to run KNOWNBUG (any test
reported as failure will tell us that a bug has unexpectedly been fixed)
an THOROUGH (tests that are expected to pass, but take longer to do so)
tests, each as a separate GitHub action.

Also run all tests tagged broken-smt-backend to confirm they haven't
been fixed.
@tautschnig tautschnig changed the title Run KNOWNBUG and THOROUGH regression tests in CI [depends-on: #6102, #6103, #6104] Run KNOWNBUG and THOROUGH regression tests in CI May 12, 2021
@tautschnig tautschnig removed their assignment May 12, 2021
@tautschnig
Copy link
Collaborator Author

@TGWDB All dependencies are now merged and this PR is rebased and should be ready for review.

Copy link
Contributor

@NlightNFotis NlightNFotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this personally, provided we don't mark the extra CI jobs as required - they shouldn't be blocking PRs in my opinion.

@tautschnig tautschnig merged commit ffd4ccf into diffblue:develop May 12, 2021
@tautschnig tautschnig deleted the test-thorough branch May 12, 2021 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants