Skip to content

Reland "[Utils] add update-verify-tests.py" (#108630)" #108658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 14, 2024

Conversation

hnrklssn
Copy link
Member

This relands commit d4f41be which was reverted by b7e585b.

This version ignores differences in line endings in the diff tests to make sure the tests work as intended on Windows.

Original description below:
Adds a python script to automatically take output from a failed clang -verify test and update the test case(s) to expect the new behaviour.

This relands commit d4f41be
which was reverted by b7e585b.

This version ignores differences in line endings in the diff tests to
make sure the tests work as intended on Windows.

Original description below:
Adds a python script to automatically take output from a failed clang
-verify test and update the test case(s) to expect the new behaviour.
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Sep 13, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2024

@llvm/pr-subscribers-clang

Author: Henrik G. Olsson (hnrklssn)

Changes

This relands commit d4f41be which was reverted by b7e585b.

This version ignores differences in line endings in the diff tests to make sure the tests work as intended on Windows.

Original description below:
Adds a python script to automatically take output from a failed clang -verify test and update the test case(s) to expect the new behaviour.


Patch is 33.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108658.diff

36 Files Affected:

  • (added) clang/test/utils/update-verify-tests/Inputs/duplicate-diag.c (+8)
  • (added) clang/test/utils/update-verify-tests/Inputs/duplicate-diag.c.expected (+8)
  • (added) clang/test/utils/update-verify-tests/Inputs/infer-indentation.c (+8)
  • (added) clang/test/utils/update-verify-tests/Inputs/infer-indentation.c.expected (+11)
  • (added) clang/test/utils/update-verify-tests/Inputs/leave-existing-diags.c (+11)
  • (added) clang/test/utils/update-verify-tests/Inputs/leave-existing-diags.c.expected (+12)
  • (added) clang/test/utils/update-verify-tests/Inputs/multiple-errors.c (+6)
  • (added) clang/test/utils/update-verify-tests/Inputs/multiple-errors.c.expected (+9)
  • (added) clang/test/utils/update-verify-tests/Inputs/multiple-missing-errors-same-line.c (+8)
  • (added) clang/test/utils/update-verify-tests/Inputs/multiple-missing-errors-same-line.c.expected (+13)
  • (added) clang/test/utils/update-verify-tests/Inputs/no-checks.c (+3)
  • (added) clang/test/utils/update-verify-tests/Inputs/no-checks.c.expected (+4)
  • (added) clang/test/utils/update-verify-tests/Inputs/no-diags.c (+5)
  • (added) clang/test/utils/update-verify-tests/Inputs/no-diags.c.expected (+5)
  • (added) clang/test/utils/update-verify-tests/Inputs/no-expected-diags.c (+4)
  • (added) clang/test/utils/update-verify-tests/Inputs/no-expected-diags.c.expected (+4)
  • (added) clang/test/utils/update-verify-tests/Inputs/non-default-prefix.c (+5)
  • (added) clang/test/utils/update-verify-tests/Inputs/non-default-prefix.c.expected (+5)
  • (added) clang/test/utils/update-verify-tests/Inputs/update-same-line.c (+4)
  • (added) clang/test/utils/update-verify-tests/Inputs/update-same-line.c.expected (+4)
  • (added) clang/test/utils/update-verify-tests/Inputs/update-single-check.c (+4)
  • (added) clang/test/utils/update-verify-tests/Inputs/update-single-check.c.expected (+4)
  • (added) clang/test/utils/update-verify-tests/duplicate-diag.test (+4)
  • (added) clang/test/utils/update-verify-tests/infer-indentation.test (+3)
  • (added) clang/test/utils/update-verify-tests/leave-existing-diags.test (+4)
  • (added) clang/test/utils/update-verify-tests/lit.local.cfg (+25)
  • (added) clang/test/utils/update-verify-tests/multiple-errors.test (+3)
  • (added) clang/test/utils/update-verify-tests/multiple-missing-errors-same-line.test (+3)
  • (added) clang/test/utils/update-verify-tests/no-checks.test (+3)
  • (added) clang/test/utils/update-verify-tests/no-diags.test (+4)
  • (added) clang/test/utils/update-verify-tests/no-expected-diags.test (+4)
  • (added) clang/test/utils/update-verify-tests/non-default-prefix.test (+4)
  • (added) clang/test/utils/update-verify-tests/update-same-line.test (+4)
  • (added) clang/test/utils/update-verify-tests/update-single-check.test (+3)
  • (added) clang/utils/UpdateVerifyTests/core.py (+452)
  • (added) clang/utils/update-verify-tests.py (+38)
diff --git a/clang/test/utils/update-verify-tests/Inputs/duplicate-diag.c b/clang/test/utils/update-verify-tests/Inputs/duplicate-diag.c
new file mode 100644
index 00000000000000..8c7e46c6eca9c1
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/duplicate-diag.c
@@ -0,0 +1,8 @@
+void foo() {
+    // expected-error@+1{{use of undeclared identifier 'a'}}
+    a = 2; a = 2;
+    b = 2; b = 2;
+    // expected-error@+1 3{{use of undeclared identifier 'c'}}
+    c = 2; c = 2;
+    // expected-error 2{{asdf}}
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/duplicate-diag.c.expected b/clang/test/utils/update-verify-tests/Inputs/duplicate-diag.c.expected
new file mode 100644
index 00000000000000..6214ff382f4495
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/duplicate-diag.c.expected
@@ -0,0 +1,8 @@
+void foo() {
+    // expected-error@+1 2{{use of undeclared identifier 'a'}}
+    a = 2; a = 2;
+    // expected-error@+1 2{{use of undeclared identifier 'b'}}
+    b = 2; b = 2;
+    // expected-error@+1 2{{use of undeclared identifier 'c'}}
+    c = 2; c = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/infer-indentation.c b/clang/test/utils/update-verify-tests/Inputs/infer-indentation.c
new file mode 100644
index 00000000000000..0210ac35fd5cd1
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/infer-indentation.c
@@ -0,0 +1,8 @@
+void foo() {
+         //     expected-error@+1    2      {{use of undeclared identifier 'a'}}
+    a = 2; a = 2; b = 2; b = 2; c = 2;
+         //     expected-error@+1    2      {{asdf}}
+    d = 2;
+    e = 2; f = 2;                 //     expected-error    2      {{use of undeclared identifier 'e'}}
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/infer-indentation.c.expected b/clang/test/utils/update-verify-tests/Inputs/infer-indentation.c.expected
new file mode 100644
index 00000000000000..5c5aaeeef97acf
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/infer-indentation.c.expected
@@ -0,0 +1,11 @@
+void foo() {
+         //     expected-error@+3          {{use of undeclared identifier 'c'}}
+         //     expected-error@+2    2      {{use of undeclared identifier 'b'}}
+         //     expected-error@+1    2      {{use of undeclared identifier 'a'}}
+    a = 2; a = 2; b = 2; b = 2; c = 2;
+         //     expected-error@+1          {{use of undeclared identifier 'd'}}
+    d = 2;
+    //     expected-error@+1          {{use of undeclared identifier 'f'}}
+    e = 2; f = 2;                 //     expected-error          {{use of undeclared identifier 'e'}}
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/leave-existing-diags.c b/clang/test/utils/update-verify-tests/Inputs/leave-existing-diags.c
new file mode 100644
index 00000000000000..1aa8d088e97273
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/leave-existing-diags.c
@@ -0,0 +1,11 @@
+void foo() {
+    a = 2;
+    // expected-error@-1{{use of undeclared identifier 'a'}}
+    b = 2;// expected-error{{use of undeclared identifier 'b'}}
+    c = 2;
+    // expected-error@5{{use of undeclared identifier 'c'}}
+    d = 2; // expected-error-re{{use of {{.*}} identifier 'd'}}
+
+    e = 2; // error to trigger mismatch
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/leave-existing-diags.c.expected b/clang/test/utils/update-verify-tests/Inputs/leave-existing-diags.c.expected
new file mode 100644
index 00000000000000..6b621061bbfbbd
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/leave-existing-diags.c.expected
@@ -0,0 +1,12 @@
+void foo() {
+    a = 2;
+    // expected-error@-1{{use of undeclared identifier 'a'}}
+    b = 2;// expected-error{{use of undeclared identifier 'b'}}
+    c = 2;
+    // expected-error@5{{use of undeclared identifier 'c'}}
+    d = 2; // expected-error-re{{use of {{.*}} identifier 'd'}}
+
+    // expected-error@+1{{use of undeclared identifier 'e'}}
+    e = 2; // error to trigger mismatch
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/multiple-errors.c b/clang/test/utils/update-verify-tests/Inputs/multiple-errors.c
new file mode 100644
index 00000000000000..e230e0a337bf49
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/multiple-errors.c
@@ -0,0 +1,6 @@
+void foo() {
+    a = 2;
+    b = 2;
+
+    c = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/multiple-errors.c.expected b/clang/test/utils/update-verify-tests/Inputs/multiple-errors.c.expected
new file mode 100644
index 00000000000000..27dc1f30a26faf
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/multiple-errors.c.expected
@@ -0,0 +1,9 @@
+void foo() {
+    // expected-error@+1{{use of undeclared identifier 'a'}}
+    a = 2;
+    // expected-error@+1{{use of undeclared identifier 'b'}}
+    b = 2;
+
+    // expected-error@+1{{use of undeclared identifier 'c'}}
+    c = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/multiple-missing-errors-same-line.c b/clang/test/utils/update-verify-tests/Inputs/multiple-missing-errors-same-line.c
new file mode 100644
index 00000000000000..03f723d44bbe82
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/multiple-missing-errors-same-line.c
@@ -0,0 +1,8 @@
+void foo() {
+    a = 2; b = 2; c = 2;
+}
+
+void bar() {
+    x = 2; y = 2; z = 2;
+    // expected-error@-1{{use of undeclared identifier 'x'}}
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/multiple-missing-errors-same-line.c.expected b/clang/test/utils/update-verify-tests/Inputs/multiple-missing-errors-same-line.c.expected
new file mode 100644
index 00000000000000..24b57f4353d95d
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/multiple-missing-errors-same-line.c.expected
@@ -0,0 +1,13 @@
+void foo() {
+    // expected-error@+3{{use of undeclared identifier 'c'}}
+    // expected-error@+2{{use of undeclared identifier 'b'}}
+    // expected-error@+1{{use of undeclared identifier 'a'}}
+    a = 2; b = 2; c = 2;
+}
+
+void bar() {
+    x = 2; y = 2; z = 2;
+    // expected-error@-1{{use of undeclared identifier 'x'}}
+    // expected-error@-2{{use of undeclared identifier 'y'}}
+    // expected-error@-3{{use of undeclared identifier 'z'}}
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/no-checks.c b/clang/test/utils/update-verify-tests/Inputs/no-checks.c
new file mode 100644
index 00000000000000..8fd1f7cd333705
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/no-checks.c
@@ -0,0 +1,3 @@
+void foo() {
+    bar = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/no-checks.c.expected b/clang/test/utils/update-verify-tests/Inputs/no-checks.c.expected
new file mode 100644
index 00000000000000..e80548fbe50f2c
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/no-checks.c.expected
@@ -0,0 +1,4 @@
+void foo() {
+    // expected-error@+1{{use of undeclared identifier 'bar'}}
+    bar = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/no-diags.c b/clang/test/utils/update-verify-tests/Inputs/no-diags.c
new file mode 100644
index 00000000000000..66d169be439402
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/no-diags.c
@@ -0,0 +1,5 @@
+void foo() {
+    // expected-error@+1{{asdf}}
+    int a = 2;
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/no-diags.c.expected b/clang/test/utils/update-verify-tests/Inputs/no-diags.c.expected
new file mode 100644
index 00000000000000..05230284945702
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/no-diags.c.expected
@@ -0,0 +1,5 @@
+// expected-no-diagnostics
+void foo() {
+    int a = 2;
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/no-expected-diags.c b/clang/test/utils/update-verify-tests/Inputs/no-expected-diags.c
new file mode 100644
index 00000000000000..78b72e1357da76
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/no-expected-diags.c
@@ -0,0 +1,4 @@
+// expected-no-diagnostics
+void foo() {
+    a = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/no-expected-diags.c.expected b/clang/test/utils/update-verify-tests/Inputs/no-expected-diags.c.expected
new file mode 100644
index 00000000000000..d948ffce56189a
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/no-expected-diags.c.expected
@@ -0,0 +1,4 @@
+void foo() {
+    // expected-error@+1{{use of undeclared identifier 'a'}}
+    a = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/non-default-prefix.c b/clang/test/utils/update-verify-tests/Inputs/non-default-prefix.c
new file mode 100644
index 00000000000000..3d63eaf0f1b878
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/non-default-prefix.c
@@ -0,0 +1,5 @@
+void foo() {
+    a = 2; // check-error{{asdf}}
+           // expected-error@-1{ignored}}
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/non-default-prefix.c.expected b/clang/test/utils/update-verify-tests/Inputs/non-default-prefix.c.expected
new file mode 100644
index 00000000000000..a877f86922123d
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/non-default-prefix.c.expected
@@ -0,0 +1,5 @@
+void foo() {
+    a = 2; // check-error{{use of undeclared identifier 'a'}}
+           // expected-error@-1{ignored}}
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/update-same-line.c b/clang/test/utils/update-verify-tests/Inputs/update-same-line.c
new file mode 100644
index 00000000000000..5278ce0c57c319
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/update-same-line.c
@@ -0,0 +1,4 @@
+void foo() {
+    bar = 2;     //   expected-error       {{asdf}}
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/update-same-line.c.expected b/clang/test/utils/update-verify-tests/Inputs/update-same-line.c.expected
new file mode 100644
index 00000000000000..8ba47f788319b1
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/update-same-line.c.expected
@@ -0,0 +1,4 @@
+void foo() {
+    bar = 2;     //   expected-error       {{use of undeclared identifier 'bar'}}
+}
+
diff --git a/clang/test/utils/update-verify-tests/Inputs/update-single-check.c b/clang/test/utils/update-verify-tests/Inputs/update-single-check.c
new file mode 100644
index 00000000000000..20b011bfc3d77e
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/update-single-check.c
@@ -0,0 +1,4 @@
+void foo() {
+    // expected-error@+1{{asdf}}
+    bar = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/Inputs/update-single-check.c.expected b/clang/test/utils/update-verify-tests/Inputs/update-single-check.c.expected
new file mode 100644
index 00000000000000..e80548fbe50f2c
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/Inputs/update-single-check.c.expected
@@ -0,0 +1,4 @@
+void foo() {
+    // expected-error@+1{{use of undeclared identifier 'bar'}}
+    bar = 2;
+}
diff --git a/clang/test/utils/update-verify-tests/duplicate-diag.test b/clang/test/utils/update-verify-tests/duplicate-diag.test
new file mode 100644
index 00000000000000..db4b0fd86f0817
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/duplicate-diag.test
@@ -0,0 +1,4 @@
+# RUN: cp %S/Inputs/duplicate-diag.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/duplicate-diag.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
+
diff --git a/clang/test/utils/update-verify-tests/infer-indentation.test b/clang/test/utils/update-verify-tests/infer-indentation.test
new file mode 100644
index 00000000000000..bd94dce4844ebf
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/infer-indentation.test
@@ -0,0 +1,3 @@
+# RUN: cp %S/Inputs/infer-indentation.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/infer-indentation.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
diff --git a/clang/test/utils/update-verify-tests/leave-existing-diags.test b/clang/test/utils/update-verify-tests/leave-existing-diags.test
new file mode 100644
index 00000000000000..8a723f157bf84a
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/leave-existing-diags.test
@@ -0,0 +1,4 @@
+# RUN: cp %S/Inputs/leave-existing-diags.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/leave-existing-diags.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
+
diff --git a/clang/test/utils/update-verify-tests/lit.local.cfg b/clang/test/utils/update-verify-tests/lit.local.cfg
new file mode 100644
index 00000000000000..a0b6afccc25010
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/lit.local.cfg
@@ -0,0 +1,25 @@
+import lit.util
+
+# python 2.7 backwards compatibility
+try:
+    from shlex import quote as shell_quote
+except ImportError:
+    from pipes import quote as shell_quote
+
+if config.standalone_build:
+    # These tests require the update-verify-tests.py script from the clang
+    # source tree, so skip these tests if we are doing standalone builds.
+    config.unsupported = True
+else:
+    config.suffixes = [".test"]
+
+    script_path = os.path.join(
+        config.clang_src_dir, "utils", "update-verify-tests.py"
+    )
+    python = shell_quote(config.python_executable)
+    config.substitutions.append(
+        (
+            "%update-verify-tests",
+            "%s %s" % (python, shell_quote(script_path)),
+        )
+    )
diff --git a/clang/test/utils/update-verify-tests/multiple-errors.test b/clang/test/utils/update-verify-tests/multiple-errors.test
new file mode 100644
index 00000000000000..1fcb6b7f2ca096
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/multiple-errors.test
@@ -0,0 +1,3 @@
+# RUN: cp %S/Inputs/multiple-errors.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/multiple-errors.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
diff --git a/clang/test/utils/update-verify-tests/multiple-missing-errors-same-line.test b/clang/test/utils/update-verify-tests/multiple-missing-errors-same-line.test
new file mode 100644
index 00000000000000..00338d7595cb78
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/multiple-missing-errors-same-line.test
@@ -0,0 +1,3 @@
+# RUN: cp %S/Inputs/multiple-missing-errors-same-line.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/multiple-missing-errors-same-line.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
diff --git a/clang/test/utils/update-verify-tests/no-checks.test b/clang/test/utils/update-verify-tests/no-checks.test
new file mode 100644
index 00000000000000..5fdbdcbac95261
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/no-checks.test
@@ -0,0 +1,3 @@
+# RUN: cp %S/Inputs/no-checks.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/no-checks.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
diff --git a/clang/test/utils/update-verify-tests/no-diags.test b/clang/test/utils/update-verify-tests/no-diags.test
new file mode 100644
index 00000000000000..825fd0219debb3
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/no-diags.test
@@ -0,0 +1,4 @@
+# RUN: cp %S/Inputs/no-diags.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/no-diags.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
+
diff --git a/clang/test/utils/update-verify-tests/no-expected-diags.test b/clang/test/utils/update-verify-tests/no-expected-diags.test
new file mode 100644
index 00000000000000..be475c190da177
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/no-expected-diags.test
@@ -0,0 +1,4 @@
+# RUN: cp %S/Inputs/no-expected-diags.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/no-expected-diags.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
+
diff --git a/clang/test/utils/update-verify-tests/non-default-prefix.test b/clang/test/utils/update-verify-tests/non-default-prefix.test
new file mode 100644
index 00000000000000..594dba4174d2e5
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/non-default-prefix.test
@@ -0,0 +1,4 @@
+# RUN: cp %S/Inputs/non-default-prefix.c %t.c && not %clang_cc1 -verify=check %t.c 2>&1 | %update-verify-tests --prefix check
+# RUN: diff --strip-trailing-cr %S/Inputs/non-default-prefix.c.expected %t.c
+# RUN: %clang_cc1 -verify=check %t.c
+
diff --git a/clang/test/utils/update-verify-tests/update-same-line.test b/clang/test/utils/update-verify-tests/update-same-line.test
new file mode 100644
index 00000000000000..b7e5d7a574eca5
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/update-same-line.test
@@ -0,0 +1,4 @@
+# RUN: cp %S/Inputs/update-same-line.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/update-same-line.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
+
diff --git a/clang/test/utils/update-verify-tests/update-single-check.test b/clang/test/utils/update-verify-tests/update-single-check.test
new file mode 100644
index 00000000000000..b958d66b099db4
--- /dev/null
+++ b/clang/test/utils/update-verify-tests/update-single-check.test
@@ -0,0 +1,3 @@
+# RUN: cp %S/Inputs/update-single-check.c %t.c && not %clang_cc1 -verify %t.c 2>&1 | %update-verify-tests
+# RUN: diff --strip-trailing-cr %S/Inputs/update-single-check.c.expected %t.c
+# RUN: %clang_cc1 -verify %t.c
diff --git a/clang/utils/UpdateVerifyTests/core.py b/clang/utils/UpdateVerifyTests/core.py
new file mode 100644
index 00000000000000..d1350cdbb698b6
--- /dev/null
+++ b/clang/utils/UpdateVerifyTests/core.py
@@ -0,0 +1,452 @@
+import sys
+import re
+
+DEBUG = False
+
+
+def dprint(*args):
+    if DEBUG:
+        print(*args, file=sys.stderr)
+
+
+class KnownException(Exception):
+    pass
+
+
+def parse_error_category(s, prefix):
+    if "no expected directives found" in s:
+        return None
+    parts = s.split("diagnostics")
+    diag_category = parts[0]
+    category_parts = parts[0].strip().strip("'").split("-")
+    expected = category_parts[0]
+    if expected != prefix:
+        raise Exception(
+            f"expected prefix '{prefix}', but found '{expected}'. Multiple verify prefixes are not supported."
+        )
+    diag_category = category_parts[1]
+    if "seen but not expected" in parts[1]:
+        seen = True
+    elif "expected but not seen" in parts[1]:
+        seen = False
+    else:
+        raise KnownException(f"unexpected category '{parts[1]}'")
+    return (diag_category, seen)
+
+
+diag_error_re = re.compile(r"File (\S+) Line (\d+): (.+)")
+diag_error_re2 = re.compile(r"File \S+ Line \d+ \(directive at (\S+):(\d+)\): (.+)")
+
+
+def parse_diag_error(s):
+    m = diag_error_re2.match(s)
+    if not m:
+        m = diag_error_re.match(s)
+    if not m:
+        return None
+    return (m.group(1), int(m.group(2)), m.group(3))
+
+
+class Line:
+    def __init__(self, content, line_n):
+        self.content = content
+        self.diag = None
+        self.line_n = line_n
+        self.targeting_diags = []
+
+    def update_line_n(self, n):
+        self.line_n = n
+
+    def render(self):
+        if not self.diag:
+            return self.content
+        assert "{{DIAG}}" in self.content
+        res = self.content.replace("{{DIAG}}", self.diag.render())
+        if not res.strip():
+            return ""
+        return res
+
+
+class Diag:
+    def __init__(
+        self,
+        prefix,
+        diag_content,
+        category,
+        parsed_target_line_n,
+        line_is_absolute,
+        count,
+        line,
+        is_re,
+        whitespace_strings,
+        is_from_source_file,
+    ):
+        self.prefix = prefix
+        self.diag_content = diag_content
+        self.category = category
+        self.parsed_target_line_n = parsed_target_line_n
+        self.line_is_absolute = line_is_absolute
+        self.count = count
+        self.line = line
+        self.target = None
+        self.is_re = is_re
+        self.absolute_target()
+        self.whitespace_strings = whitespace_strings
+        self.is_from_source_file = is_from_source_file
+
+    def decrement_count(self):
+        self.count -= 1
+        assert self.count >= 0
+
+    def increment_count(self):
+        assert self.count >= 0
+        self.count += 1
+
+    def unset_target(self):
+        assert self.target is not None
+       ...
[truncated]

@hnrklssn hnrklssn merged commit 9ceb967 into llvm:main Sep 14, 2024
8 of 9 checks passed
@hnrklssn hnrklssn deleted the reland-update-verify-tests branch September 14, 2024 02:04
@jakeegan
Copy link
Member

Hi, --strip-trailing-cr isn't a supported option with diff on AIX. Would you be able to adapt the tests?

https://lab.llvm.org/buildbot/#/builders/64/builds/985

@hnrklssn
Copy link
Member Author

Hi, --strip-trailing-cr isn't a supported option with diff on AIX. Would you be able to adapt the tests?

https://lab.llvm.org/buildbot/#/builders/64/builds/985

This should fix it: #108871

@erichkeane
Copy link
Collaborator

As a CFE reviewer, I very much hate the existence of this script (and the CodeGen equivalents, but that is a separate discussion).

This existing means I can no longer trust that a test is a reflection of what the author intended to write, as they might have just run this and "YOLO'ed" it away. So now I have to be 100x more careful reviewing tests, which means significantly worse review bandwidth. This is the same problem I have with AI generated code/tests, as this is basically only a 1/2 step above, "Hey Google, write me a lit test for this".

@cor3ntin and @AaronBallman and @Endilll .

@hnrklssn
Copy link
Member Author

As a CFE reviewer, I very much hate the existence of this script (and the CodeGen equivalents, but that is a separate discussion).

This existing means I can no longer trust that a test is a reflection of what the author intended to write, as they might have just run this and "YOLO'ed" it away. So now I have to be 100x more careful reviewing tests, which means significantly worse review bandwidth. This is the same problem I have with AI generated code/tests, as this is basically only a 1/2 step above, "Hey Google, write me a lit test for this".

@cor3ntin and @AaronBallman and @Endilll .

That was always true, in the sense that there's nothing preventing the author from doing the exact same thing: looking at the output from the clang -verify failure and adding/removing expected diagnostics until it matches. In fact, even without this script that is the most mindless way of doing it, and I'm sure it already happens a lot.

In either way, authors should review their own changes first. I would argue that this script is significantly less harmful than any FileCheck generator script, because given an input program there is much less flexibility in how to structure the checks. With FileCheck you may not want to check every line, since some of them aren't directly relevant to the thing you're testing, instead making the test more fragile and harder to review through bloat. So creating FileCheck check lines is a bit of an art. But for clang -verify you have to add checkers for exactly the emitted diagnostics. No more or less. So the only real question is on which line to place them.

That said my main goal is not to make creating tests easier (although it of course also does that, and I don't know that I agree that it's harmful to lower the threshold to create tests), but to make it easier to batch update many/large breaking tests. We have many tests downstream that while testing features that don't yet exist upstream, are affected by new diagnostics added upstream. Handling many tests breaking all at once because of upstream changes is easier if it doesn't require a ton of sed-fu. Similar interactions happen when cherry-picking a patch from a dev branch to one or multiple release branches, that may not contain the exact same set of diagnostics. I could of course keep this script downstream, but we are not the only ones with downstream changes and I think others will find it helpful. And that's for a completely new script: keeping something like #108425 downstream only would be result in merge conflicts.

Furthermore, I have observed how existing regression tests create friction to improving phrasing of existing diagnostics. Old diagnostics often appear in many many tests, so while the code change to improving the diagnostic may be quick, you know it'll take significant time to update all of the regression tests using the old phrasing. So you don't do that drive-by improvement, because that wasn't your goal anyways. I think there are a ton of diagnostics in clang that could be improved to be clearer, but when the improvement is minor and friction is high we get stuck with the "good enough".

@Endilll
Copy link
Contributor

Endilll commented Sep 17, 2024

That was always true, in the sense that there's nothing preventing the author from doing the exact same thing: looking at the output from the clang -verify failure and adding/removing expected diagnostics until it matches. In fact, even without this script that is the most mindless way of doing it, and I'm sure it already happens a lot.

An author still has to pay some attention to the diagnostic test they are changing, and they still can notice things. Sure, they might not pick subtleties, but I think there's a fair chance to catch something really wrong. Again, we're constrained on review bandwidth, and not on making changes. As a testament to that, number of active PRs has been constantly going up ever since we switched to them, and we also have a long tail of opened reviews in Phabricator. So this little nudge to review your work helps the worst bottleneck we've been having.

But for clang -verify you have to add checkers for exactly the emitted diagnostics. No more or less. So the only real question is on which line to place them.

I don't think it's that simple at least for some tests. We have tests that run in -verify mode multiple times with different combinations of compiler options. Such tests might rely on custom prefixes (-verify=prefix1,prefix2), and updating them automatically can easily yield incorrect result. As an example how more complicated tests might look like, consider C++ DR test suite. Here's one example:

return p;
// cxx98-20-error@-1 {{cannot initialize return object of type 'const int ***' with an lvalue of type 'int ***'}}
// since-cxx23-error@-2 {{cannot initialize return object of type 'const int ***' with an rvalue of type 'int ***'}}
// expected-note@#cwg349-p1 {{in instantiation of function template specialization 'cwg349::A::operator const int ***<const int>' requested here}}

That said my main goal is not to make creating tests easier (although it of course also does that, and I don't know that I agree that it's harmful to lower the threshold to create tests), but to make it easier to batch update many/large breaking tests. We have many tests downstream that while testing features that don't yet exist upstream, are affected by new diagnostics added upstream. Handling many tests breaking all at once because of upstream changes is easier if it doesn't require a ton of sed-fu. Similar interactions happen when cherry-picking a patch from a dev branch to one or multiple release branches, that may not contain the exact same set of diagnostics. I could of course keep this script downstream, but we are not the only ones with downstream changes and I think others will find it helpful. And that's for a completely new script: keeping something like #108425 downstream only would be result in merge conflicts.

I don't intend to dismiss the pain downstream could have over this, but in my opinion it doesn't justify opening floodgates of automated test updates upstream. At the moment I don't have a strong opinion on infrastructure that supports such automated updates, like #108425 you mentioned.

Furthermore, I have observed how existing regression tests create friction to improving phrasing of existing diagnostics. Old diagnostics often appear in many many tests, so while the code change to improving the diagnostic may be quick, you know it'll take significant time to update all of the regression tests using the old phrasing.

I agree that the pain is there. But again, I don't think the end justifies the mean.

So you don't do that drive-by improvement, because that wasn't your goal anyways.

Drive-by fix that makes you update a lot of test would introduce a lot of noise into your PR anyway, making it noticeably harder to review. I believe the script addresses the wrong problem here.

@hnrklssn
Copy link
Member Author

But for clang -verify you have to add checkers for exactly the emitted diagnostics. No more or less. So the only real question is on which line to place them.

I don't think it's that simple at least for some tests. We have tests that run in -verify mode multiple times with different combinations of compiler options. Such tests might rely on custom prefixes (-verify=prefix1,prefix2), and updating them automatically can easily yield incorrect result. As an example how more complicated tests might look like, consider C++ DR test suite. Here's one example:

return p;
// cxx98-20-error@-1 {{cannot initialize return object of type 'const int ***' with an lvalue of type 'int ***'}}
// since-cxx23-error@-2 {{cannot initialize return object of type 'const int ***' with an rvalue of type 'int ***'}}
// expected-note@#cwg349-p1 {{in instantiation of function template specialization 'cwg349::A::operator const int ***<const int>' requested here}}

This script doesn't support test failures in multiple prefixes or regex matchers for that reason, because I could not find a reliable way to update those failures without manual intervention.

That said my main goal is not to make creating tests easier (although it of course also does that, and I don't know that I agree that it's harmful to lower the threshold to create tests), but to make it easier to batch update many/large breaking tests. We have many tests downstream that while testing features that don't yet exist upstream, are affected by new diagnostics added upstream. Handling many tests breaking all at once because of upstream changes is easier if it doesn't require a ton of sed-fu. Similar interactions happen when cherry-picking a patch from a dev branch to one or multiple release branches, that may not contain the exact same set of diagnostics. I could of course keep this script downstream, but we are not the only ones with downstream changes and I think others will find it helpful. And that's for a completely new script: keeping something like #108425 downstream only would be result in merge conflicts.

I don't intend to dismiss the pain downstream could have over this, but in my opinion it doesn't justify opening floodgates of automated test updates upstream. At the moment I don't have a strong opinion on infrastructure that supports such automated updates, like #108425 you mentioned.

Furthermore, I have observed how existing regression tests create friction to improving phrasing of existing diagnostics. Old diagnostics often appear in many many tests, so while the code change to improving the diagnostic may be quick, you know it'll take significant time to update all of the regression tests using the old phrasing.

I agree that the pain is there. But again, I don't think the end justifies the mean.

That's fair, we disagree on this point. I think responsible use of automatic test updaters is reasonable, and we shouldn't prevent people from using it just because it could be abused. I'm sympathetic that there may be many low quality PRs to wade through, but I'd rather let the number of open PRs continue to grow and let more people step up to the task of reviewing as it becomes more apparent that we need more reviewers. Artificially hampering developer productivity just because it's not the bottleneck is hiding the symptom rather than fixing the problem (too few reviewers in comparison to the number of PRs).

I do want to stress though, that this script handles pretty trivial updates. Yes, it doesn't force people to look at the changes they're submitting, but compared to existing scripts this is significantly less risky. I don't want to resort to whataboutism, but I do think we need to be consistent if we're going to ban test update scripts.

So you don't do that drive-by improvement, because that wasn't your goal anyways.

Drive-by fix that makes you update a lot of test would introduce a lot of noise into your PR anyway, making it noticeably harder to review.

Of course the drive-by fix would have to be submitted as a separate PR in that case, but that's quite straightforward even after the fact when you have a script like this: extract the implementation into a separate commit, and rerun the test updater on both branches to bring over the test updates to the new PR and undo the noise on the original PR.

I believe the script addresses the wrong problem here.

There's no getting around that the tests need to be updated if diagnostics are updated. Lack of reviewers doesn't mean developer issues can't also be addressed.

@AaronBallman
Copy link
Collaborator

But for clang -verify you have to add checkers for exactly the emitted diagnostics. No more or less. So the only real question is on which line to place them.

As someone who does a lot of review in Clang, I (strongly) disagree with this assertion.

You do not have to add checks for exactly the emitted diagnostics, there are times when it's more appropriate to drop some part of the message. In fact, we used to have guidance that only the very first instance of the diagnostic should be spelled out fully, and all subsequent ones should be limited to just the bare minimum needed to identify what the diagnostic is. That's why there are hundreds of instances of things like // expected-note {{here}}. What's more, there is a rich set of functionality provided by the diagnostic verifier that we often request patch authors use, but is stylistic. It's important to use regular expressions and bookmarks to help make the tests readable and reviewable, which is why they're used thousands of times across Clang's lit tests. Finally, as @Endilll pointed out, there are tests which have multiple (sometime numerous) RUN lines where the output is expected to differ between RUN lines; we typically ask authors to use custom verify prefixes for those tests, but crafting those can sometimes be an art form (and other times is very straightforward).

Writing tests is not a mechanical operation, it requires you to think about what you need to test as well as think about how you test it. I think the same is less true for things like testing LLVM IR because that tends to be more routine pattern matching, which typically requires less stylistic choices.

That said my main goal is not to make creating tests easier (although it of course also does that, and I don't know that I agree that it's harmful to lower the threshold to create tests), but to make it easier to batch update many/large breaking tests.

This is a serious problem we have, so working on ways to improve that is greatly appreciated! While there's some pretty significant push-back about your current approach, perhaps there is an alternative which makes folks feel more comfortable. For example, the script could explicitly refuse to add new diagnostic comments, only remove or update existing comments. That would make it require more developer effort when adding a new diagnostic and wanting to update tests, but that's a feature not a bug because we want the developer to think critically about whether the diagnostic actually makes sense everywhere it fires before they put up the PR. It doesn't solve all problems (like style choices), but it at least helps in the easier cases without the risk of people abusing it accidentally.

This existing means I can no longer trust that a test is a reflection of what the author intended to write, as they might have just run this and "YOLO'ed" it away.

FWIW, I share this concern -- developers could do this anyway by just writing the comments by hand, but a script encourages the developer to trust the output the compiler gives because it's easy to assume problems will be caught by review. This slows down review throughput, increases the burden on reviewers, and given how often people update existing diagnostics, seems like a poor tradeoff (then again, maybe more people will update existing diagnostics if that wasn't so painful to do).

Given the amount of push-back, I think we should probably revert the changes here so we can discuss with a bit wider of an audience than weighed in on the original PR.

@erichkeane
Copy link
Collaborator

You do not have to add checks for exactly the emitted diagnostics, there are times when it's more appropriate to drop some part of the message. In fact, we used to have guidance that only the very first instance of the diagnostic should be spelled out fully, and all subsequent ones should be limited to just the bare minimum needed to identify what the diagnostic is. That's why there are hundreds of instances of things like // expected-note {{here}}. What's more, there is a rich set of functionality provided by the diagnostic verifier that we often request patch authors use, but is stylistic. It's important to use regular expressions and bookmarks to help make the tests readable and reviewable, which is why they're used thousands of times across Clang's lit tests. Finally, as @Endilll pointed out, there are tests which have multiple (sometime numerous) RUN lines where the output is expected to differ between RUN lines; we typically ask authors to use custom verify prefixes for those tests, but crafting those can sometimes be an art form (and other times is very straightforward).

So another huge side note: I've been pushing REALLY hard for us to use the bookmark feature more aggressively for notes. So I've been asking authors to do THAT as well.

@hnrklssn
Copy link
Member Author

In fact, we used to have guidance that only the very first instance of the diagnostic should be spelled out fully, and all subsequent ones should be limited to just the bare minimum needed to identify what the diagnostic is. That's why there are hundreds of instances of things like // expected-note {{here}}.

Side note: In general I think the fact that these tests pass if the check implicitly match on a subset of the diagnostic does more harm than good. When test authors actively want to do that the can always use the regex feature, but I think the default should be to match the exact string. When there are similar diagnostics or shorter/longer version it's too much of a footgun to accidentally have shorter diagnostic match the longer version as well. When I review, i prefer the test to give me a picture as close to the exact output as possible, so I can evaluate the user friendliness. I think it's a massive shortcoming that we don't check the order in which notes are emitted, or which range is highlighted, because reviewers don't get the full picture without checking out the branch and manually compiling test cases.

Finally, as @Endilll pointed out, there are tests which have multiple (sometime numerous) RUN lines where the output is expected to differ between RUN lines; we typically ask authors to use custom verify prefixes for those tests, but crafting those can sometimes be an art form (and other times is very straightforward).

Again, this script doesn't run on tests with multiple prefixes. It's meant for test cases which are trivial to update, of which there are many.

Writing tests is not a mechanical operation, it requires you to think about what you need to test as well as think about how you test it. I think the same is less true for things like testing LLVM IR because that tends to be more routine pattern matching, which typically requires less stylistic choices.

Sometimes it is, sometimes it isn't. Even when it isn't just having the checks added to the test file in approximately the final location for me to move around is a massive boon compared to manually adding them all.

Given the amount of push-back, I think we should probably revert the changes here so we can discuss with a bit wider of an audience than weighed in on the original PR.

Yup, will revert.

So another huge side note: I've been pushing REALLY hard for us to use the bookmark feature more aggressively for notes. So I've been asking authors to do THAT as well.

I appreciate that, it can be really tricky to get a feel for the flow of notes, especially when a diagnostic has multiple notes attached. The test sadly doesn't check that the order of the notes emitted is actually sensical, for example. I don't know the solution, but I think there is some inherent flaw when it comes to how -verify tests operate currently, in that you need to trust that the author formatted the test to accurately reflect reality.

@erichkeane
Copy link
Collaborator

I don't know the solution, but I think there is some inherent flaw when it comes to how -verify tests operate currently, in that you need to trust that the author formatted the test to accurately reflect reality.

In part why I push for bookmarks, it makes it clear on the ordering. And also why a tool like this makes me afraid, it makes me trust the authors a bunch less.

@AaronBallman
Copy link
Collaborator

In fact, we used to have guidance that only the very first instance of the diagnostic should be spelled out fully, and all subsequent ones should be limited to just the bare minimum needed to identify what the diagnostic is. That's why there are hundreds of instances of things like // expected-note {{here}}.

Side note: In general I think the fact that these tests pass if the check implicitly match on a subset of the diagnostic does more harm than good.

+1, it's why I stopped recommending this approach in my reviews.

Writing tests is not a mechanical operation, it requires you to think about what you need to test as well as think about how you test it. I think the same is less true for things like testing LLVM IR because that tends to be more routine pattern matching, which typically requires less stylistic choices.

Sometimes it is, sometimes it isn't. Even when it isn't just having the checks added to the test file in approximately the final location for me to move around is a massive boon compared to manually adding them all.

Sure, it's a time-saver when used as intended. I think the push-back is largely coming from recognition that not everyone uses tools as they are intended and we need to balance making it easy to update a large amount of tests (something which occurs infrequently) against the risks of (even accidental) tool misuse.

Given the amount of push-back, I think we should probably revert the changes here so we can discuss with a bit wider of an audience than weighed in on the original PR.

Yup, will revert.

Thank you!

@erichkeane
Copy link
Collaborator

+1, it's why I stopped recommending this approach in my reviews.

While I agree for the most part, there are quite a few diagnostics that are absurdly verbose (particularly ones with "did you mean to FLOOP it?"). I think partial matches are good/fine, but they need enough of the diag to not be ambiguous.

@hnrklssn
Copy link
Member Author

+1, it's why I stopped recommending this approach in my reviews.

While I agree for the most part, there are quite a few diagnostics that are absurdly verbose (particularly ones with "did you mean to FLOOP it?"). I think partial matches are good/fine, but they need enough of the diag to not be ambiguous.

I don't necessarily oppose the use of truncated checks where it makes sense, I oppose the implicit default on the matcher's part. An expected-note-re{{.*FLOOP.*}} would be a reasonable explicit opt-in. A new mode like expected-note-part{{FLOOP}} would be even better, making the old mode behave like expected-note-re{{^FLOOP$}}. As an (admittedly extreme) example of footgunnery with this implicit behaviour, we had bug downstream where there were two versions of an error, something like this: cannot use FOO in context asdf vs cannot use FOO in context asdf; did you mean to use BAR?. The ; did you mean to use BAR? version was accidentally emitted in some cases where it didn't make sense to do so, but the tests still passed because the diagnostic did contain cannot use FOO in context asdf.

hnrklssn added a commit to hnrklssn/llvm-project that referenced this pull request Sep 18, 2024
This reverts commits c96ee0f
and 9ceb967.

Discussion in github PR llvm#108658.
hnrklssn added a commit that referenced this pull request Sep 18, 2024
This reverts commits c96ee0f and
9ceb967.

Discussion in github PR #108658.
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
This reverts commits c96ee0f and
9ceb967.

Discussion in github PR llvm#108658.
@jroelofs
Copy link
Contributor

The ; did you mean to use BAR? version was accidentally emitted in some cases where it didn't make sense to do so, but the tests still passed because the diagnostic did contain cannot use FOO in context asdf.

There's a FileCheck flag to enforce this: --match-full-lines

@hnrklssn
Copy link
Member Author

The ; did you mean to use BAR? version was accidentally emitted in some cases where it didn't make sense to do so, but the tests still passed because the diagnostic did contain cannot use FOO in context asdf.

There's a FileCheck flag to enforce this: --match-full-lines

Clang diagnostic tests don't use FileCheck, instead clang has a -verify flag.

@jroelofs
Copy link
Contributor

oh, right. nvm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants