Run KNOWNBUG and THOROUGH regression tests in CI #5958

tautschnig · 2021-03-19T15:25:22Z

Use the check-ubuntu-20_04-cmake-gcc job to run KNOWNBUG (any test
reported as failure will tell us that a bug has unexpectedly been fixed)
an THOROUGH (tests that are expected to pass, but take longer to do so)
tests.

Each commit message has a non-empty body, explaining why the change was made.
n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
n/a My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2021-03-19T16:47:23Z

Codecov Report

Merging #5958 (882c670) into develop (4c14789) will increase coverage by 0.59%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #5958      +/-   ##
===========================================
+ Coverage    74.52%   75.11%   +0.59%     
===========================================
  Files         1447     1447              
  Lines       157808   157807       -1     
===========================================
+ Hits        117610   118543     +933     
+ Misses       40198    39264     -934

Impacted Files	Coverage Δ
src/solvers/smt2/smt2_conv.cpp	`60.63% <0.00%> (+0.22%)`	⬆️
src/solvers/lowering/byte_operators.cpp	`92.16% <0.00%> (+0.36%)`	⬆️
src/ansi-c/c_typecast.cpp	`79.00% <0.00%> (+0.55%)`	⬆️
src/goto-programs/goto_program.h	`91.63% <0.00%> (+0.64%)`	⬆️
src/goto-instrument/wmm/cycle_collection.cpp	`88.04% <0.00%> (+3.26%)`	⬆️
src/goto-instrument/rw_set.h	`47.27% <0.00%> (+5.45%)`	⬆️
src/goto-instrument/wmm/event_graph.h	`71.29% <0.00%> (+6.48%)`	⬆️
src/goto-instrument/wmm/goto2graph.cpp	`54.41% <0.00%> (+9.11%)`	⬆️
src/goto-instrument/rw_set.cpp	`52.68% <0.00%> (+10.75%)`	⬆️
src/solvers/flattening/boolbv_index.cpp	`89.72% <0.00%> (+10.77%)`	⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 865d3b6...882c670. Read the comment docs.

martin-cs

This is a nice idea. Is it worth running this once first and PRing any of the KNOWNBUG tests that now work? My impression is there are probably going to be some and they may need attention to figure out if they are really working on just incorrect tests.

NlightNFotis · 2021-03-22T10:24:47Z

I'm not a huge fan of repurposing existing jobs to be honest.

Would it be possible that this gets done as a separate job?

Otherwise I agree that this is an excellent idea.

tautschnig · 2021-03-26T08:18:16Z

I'm not a huge fan of repurposing existing jobs to be honest.

Would it be possible that this gets done as a separate job?

Otherwise I agree that this is an excellent idea.

I guess I'm still living the times of limited Travis runners :-) That's fixed now, added two new jobs.

tautschnig · 2021-03-26T08:19:26Z

This is a nice idea. Is it worth running this once first and PRing any of the KNOWNBUG tests that now work? My impression is there are probably going to be some and they may need attention to figure out if they are really working on just incorrect tests.

Belated response: yes, I'll try to git bisect to figure out when exactly tests were fixed. Will keep this PR in "Draft" state for the time being.

TGWDB

I like the idea of the KNOWNBUG checking to routinely check for improvements/changes. The THOROUGH on all PRs seems like it may be a lot of extra load?

.github/workflows/pull-request-checks.yaml

tautschnig · 2021-04-06T11:12:38Z

I like the idea of the KNOWNBUG checking to routinely check for improvements/changes. The THOROUGH on all PRs seems like it may be a lot of extra load?

I'm not sure we care? "CBMC / codecov-coverage-report" takes longer (102 vs 75 minutes), and GitHub happily runs all of the in parallel (up to 20 - according to https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits).

TGWDB

Broadly this PR looks good, but I would prefer that other fixes/changes that are not simply a CI change be a separate PR. For example 0084739 appears to be bug fix that was found along the way (and great to have done), but not part of the titular change.

tautschnig · 2021-05-10T10:23:36Z

Broadly this PR looks good, but I would prefer that other fixes/changes that are not simply a CI change be a separate PR. For example 0084739 appears to be bug fix that was found along the way (and great to have done), but not part of the titular change.

Fair point, I've factored those out into #6102, #6103, #6104. The remaining commits could perhaps be factored out as well, but wouldn't be subject to CI-based testing until last commit of this PR is merged (which, in turn, can only be merged if fixing those issues first).

These had been marked as THOROUGH without fully checking their status.

The assertion of interest now has number 3, which is irrelevant and should not cause the test to fail.

The block includes lines 10 and 11.

The reference implementation and the built-in function disagree in case the index is out of bounds.

Several tests spuriously failed as they were lacking the Java models library. Some of them also do not need to be marked as "THOROUGH" for they terminate in under 1 second. One test moved from THOROUGH to KNOWNBUG as the assertions are (unexpectedly) deemed unreachable.

Copy the check-ubuntu-20_04-cmake-gcc job to run KNOWNBUG (any test reported as failure will tell us that a bug has unexpectedly been fixed) an THOROUGH (tests that are expected to pass, but take longer to do so) tests, each as a separate GitHub action. Also run all tests tagged broken-smt-backend to confirm they haven't been fixed.

tautschnig · 2021-05-12T12:14:25Z

@TGWDB All dependencies are now merged and this PR is rebased and should be ready for review.

NlightNFotis

I'm happy with this personally, provided we don't mark the extra CI jobs as required - they shouldn't be blocking PRs in my opinion.

tautschnig added the Tests label Mar 19, 2021

tautschnig self-assigned this Mar 19, 2021

martin-cs approved these changes Mar 21, 2021

View reviewed changes

tautschnig mentioned this pull request Mar 26, 2021

Fix SMT2 encoding of array_of_exprt #5974

Merged

7 tasks

tautschnig force-pushed the test-thorough branch from c154554 to fcf9195 Compare March 26, 2021 08:17

tautschnig force-pushed the test-thorough branch 5 times, most recently from e257d69 to aa6921d Compare April 1, 2021 19:20

tautschnig added the dependent - do not merge label Apr 1, 2021

tautschnig force-pushed the test-thorough branch 4 times, most recently from 8912f94 to d303849 Compare April 2, 2021 13:55

tautschnig marked this pull request as ready for review April 2, 2021 16:55

tautschnig requested a review from a team as a code owner April 2, 2021 16:55

tautschnig removed the dependent - do not merge label Apr 2, 2021

tautschnig assigned NlightNFotis and TGWDB and unassigned tautschnig Apr 2, 2021

TGWDB reviewed Apr 6, 2021

View reviewed changes

.github/workflows/pull-request-checks.yaml Show resolved Hide resolved

TGWDB reviewed Apr 6, 2021

View reviewed changes

.github/workflows/pull-request-checks.yaml Show resolved Hide resolved

tautschnig force-pushed the test-thorough branch from d303849 to ab0071c Compare April 6, 2021 11:14

tautschnig force-pushed the test-thorough branch from ab0071c to f84b250 Compare May 6, 2021 14:53

tautschnig self-assigned this May 6, 2021

tautschnig force-pushed the test-thorough branch from f84b250 to bfc663e Compare May 7, 2021 21:31

tautschnig requested review from chrisr-diffblue, peterschrammel and smowton as code owners May 7, 2021 21:31

tautschnig force-pushed the test-thorough branch 3 times, most recently from 20ece87 to 0bbacb1 Compare May 9, 2021 09:05

TGWDB reviewed May 10, 2021

View reviewed changes

tautschnig changed the title ~~Run KNOWNBUG and THOROUGH regression tests in CI~~ Run KNOWNBUG and THOROUGH regression tests in CI [depends-on: #6102, #6103, #6104] May 10, 2021

tautschnig added the dependent - do not merge label May 10, 2021

tautschnig added 6 commits May 12, 2021 12:12

Mark failing goto-instrument-wmm tests as KNOWNBUG

e94c3ba

These had been marked as THOROUGH without fully checking their status.

gcc_popcount2: fix assertion matching pattern

36534c1

The assertion of interest now has number 3, which is irrelevant and should not cause the test to fail.

Fix line number to make location15 test pass as expected

b4614be

The block includes lines 10 and 11.

jbmc-strings/VerifStringLastIndexOf: ensure index is within String

9165f86

The reference implementation and the built-in function disagree in case the index is out of bounds.

tautschnig force-pushed the test-thorough branch from 0bbacb1 to 882c670 Compare May 12, 2021 12:13

tautschnig changed the title ~~Run KNOWNBUG and THOROUGH regression tests in CI [depends-on: #6102, #6103, #6104]~~ Run KNOWNBUG and THOROUGH regression tests in CI May 12, 2021

tautschnig removed the dependent - do not merge label May 12, 2021

tautschnig removed their assignment May 12, 2021

NlightNFotis approved these changes May 12, 2021

View reviewed changes

tautschnig merged commit ffd4ccf into diffblue:develop May 12, 2021

tautschnig deleted the test-thorough branch May 12, 2021 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run KNOWNBUG and THOROUGH regression tests in CI #5958

Run KNOWNBUG and THOROUGH regression tests in CI #5958

Uh oh!

tautschnig commented Mar 19, 2021

Uh oh!

codecov bot commented Mar 19, 2021 •

edited

Loading

Uh oh!

martin-cs left a comment

Uh oh!

NlightNFotis commented Mar 22, 2021 •

edited

Loading

Uh oh!

tautschnig commented Mar 26, 2021

Uh oh!

tautschnig commented Mar 26, 2021

Uh oh!

TGWDB left a comment

Uh oh!

Uh oh!

Uh oh!

tautschnig commented Apr 6, 2021

Uh oh!

TGWDB left a comment

Uh oh!

tautschnig commented May 10, 2021

Uh oh!

tautschnig commented May 12, 2021

Uh oh!

NlightNFotis left a comment

Uh oh!

Uh oh!

Run KNOWNBUG and THOROUGH regression tests in CI #5958

Run KNOWNBUG and THOROUGH regression tests in CI #5958

Uh oh!

Conversation

tautschnig commented Mar 19, 2021

Uh oh!

codecov bot commented Mar 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

martin-cs left a comment

Choose a reason for hiding this comment

Uh oh!

NlightNFotis commented Mar 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tautschnig commented Mar 26, 2021

Uh oh!

tautschnig commented Mar 26, 2021

Uh oh!

TGWDB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tautschnig commented Apr 6, 2021

Uh oh!

TGWDB left a comment

Choose a reason for hiding this comment

Uh oh!

tautschnig commented May 10, 2021

Uh oh!

tautschnig commented May 12, 2021

Uh oh!

NlightNFotis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 19, 2021 •

edited

Loading

NlightNFotis commented Mar 22, 2021 •

edited

Loading