Set up to use pytest for our data-driven tests, and switch testcheck over to it (#1944)

gnprice · web-flow · commit 255fb6170aba · 2016-07-27T16:08:50.000-07:00
This is a step toward #1673 (switching entirely to pytest from myunit and runtests.py), using some of the ideas developed in @kirbyfan64's PR #1723. Both `py.test` with no arguments and `py.test mypy/test/testcheck.py` work just as you'd hope, while `./runtests.py` continues to run all the tests. The output is very similar to the myunit output. It doesn't spew voluminous stack traces or other verbose data, and it continues using `assert_string_arrays_equal` to produce nicely-formatted comparisons when e.g. type-checker error messages differ. On error it even includes the filename and line number for the test case itself, which I've long wanted, and I think pytest's separator lines and coloration make the output slightly easier to read when there are multiple failures. The `-i` option from myunit is straightforwardly ported over as `--update-data`, giving it a longer name because it feels like the kind of heavyweight and uncommon operation that deserves such a name. It'd be equally straightforward to port over `-u`, but in the presence of source control I think `--update-data` does the job on its own. One small annoyance is that if there's a failure early in a test run, pytest doesn't print the detailed report on that failure until the whole run is over. This has often annoyed me in using pytest on other projects; useful workarounds include passing `-x` to make it stop at the first failure, `-k` to filter the set of tests to be run, or (especially with our tests where errors often go through `assert_string_arrays_equal`) `-s` to let stdout and stderr pass through immediately. For interactive use I think it'd nearly always be preferable to do what myunit does by immediately printing the detailed information, so I may come back to this later to try to get pytest to do that. We don't yet take advantage of `xdist` to parallelize within a `py.test` run (though `xdist` works for me out of the box in initial cursory testing.) For now we just stick with the `runtests.py` parallelization, so we set up a separate `py.test` command for each test module.
diff --git a/README.md b/README.md
@@ -184,19 +184,26 @@ To run all tests, run the script `runtests.py` in the mypy repository:
 Note that some tests will be disabled for older python versions.
 
 This will run all tests, including integration and regression tests,
-and will type check mypy and verify that all stubs are valid. You can also
-run unit tests only, which run pretty quickly:
-
-    $ ./runtests.py unit-test
+and will type check mypy and verify that all stubs are valid.
 
 You can run a subset of test suites by passing positive or negative
 filters:
 
     $ ./runtests.py lex parse -x lint -x stub
 
-If you want to run individual unit tests, you can run `myunit` directly, or
+For example, to run unit tests only, which run pretty quickly:
+
+    $ ./runtests.py unit-test pytest
+
+The unit test suites are driven by a mixture of test frameworks:
+mypy's own `myunit` framework, and `pytest`, which we're in the
+process of migrating to.  For finer control over which unit tests are
+run and how, you can run `py.test` or `scripts/myunit` directly, or
 pass inferior arguments via `-a`:
 
+    $ py.test mypy/test/testcheck.py -v -k MethodCall
+    $ ./runtests.py -v 'pytest mypy/test/testcheck' -a -v -a -k -a MethodCall
+
     $ PYTHONPATH=$PWD scripts/myunit -m mypy.test.testlex -v '*backslash*'
     $ ./runtests.py mypy.test.testlex -a -v -a '*backslash*'
 
diff --git a/conftest.py b/conftest.py
@@ -0,0 +1,3 @@
+pytest_plugins = [
+    'mypy.test.data',
+]
diff --git a/mypy/myunit/__init__.py b/mypy/myunit/__init__.py
@@ -14,8 +14,6 @@
 is_quiet = False
 patterns = []  # type: List[str]
 times = []  # type: List[Tuple[float, str]]
-APPEND_TESTCASES = ''
-UPDATE_TESTCASES = False
 
 
 class AssertionFailure(Exception):
@@ -199,7 +197,6 @@ def __init__(self, suites: List[Suite]) -> None:
 
 def main(args: List[str] = None) -> None:
     global patterns, is_verbose, is_quiet
-    global APPEND_TESTCASES, UPDATE_TESTCASES
     if not args:
         args = sys.argv[1:]
     is_verbose = False
@@ -213,12 +210,6 @@ def main(args: List[str] = None) -> None:
             is_verbose = True
         elif a == '-q':
             is_quiet = True
-        elif a == '-u':
-            APPEND_TESTCASES = '.new'
-            UPDATE_TESTCASES = True
-        elif a == '-i':
-            APPEND_TESTCASES = ''
-            UPDATE_TESTCASES = True
         elif a == '-m':
             i += 1
             if i == len(args):
@@ -227,7 +218,7 @@ def main(args: List[str] = None) -> None:
         elif not a.startswith('-'):
             patterns.append(a)
         else:
-            sys.exit('Usage: python -m mypy.myunit [-v] [-q] [-u | -i]'
+            sys.exit('Usage: python -m mypy.myunit [-v] [-q]'
                     + ' -m mypy.test.module [-m mypy.test.module ...] [filter ...]')
         i += 1
     if len(patterns) == 0:
diff --git a/mypy/test/collect.py b/mypy/test/collect.py
diff --git a/mypy/test/data.py b/mypy/test/data.py
@@ -6,21 +6,26 @@
 from os import remove, rmdir
 import shutil
 
+import pytest  # type: ignore  # no pytest in typeshed
 from typing import Callable, List, Tuple, Set, Optional
 
 from mypy.myunit import TestCase, SkipTestCaseException
 
 
 def parse_test_cases(
         path: str,
-        perform: Callable[['DataDrivenTestCase'], None],
+        perform: Optional[Callable[['DataDrivenTestCase'], None]],
         base_path: str = '.',
         optional_out: bool = False,
         include_path: str = None,
         native_sep: bool = False) -> List['DataDrivenTestCase']:
     """Parse a file with test case descriptions.
 
     Return an array of test cases.
+
+    NB this function and DataDrivenTestCase are shared between the
+    myunit and pytest codepaths -- if something looks redundant,
+    that's likely the reason.
     """
 
     if not include_path:
@@ -336,3 +341,77 @@ def fix_win_path(line: str) -> str:
         filename, lineno, message = m.groups()
         return '{}:{}{}'.format(filename.replace('/', '\\'),
                                 lineno or '', message)
+
+
+##
+#
+# pytest setup
+#
+##
+
+
+def pytest_addoption(parser):
+    group = parser.getgroup('mypy')
+    group.addoption('--update-data', action='store_true', default=False,
+                    help='Update test data to reflect actual output'
+                         ' (supported only for certain tests)')
+
+
+def pytest_pycollect_makeitem(collector, name, obj):
+    if not isinstance(obj, type) or not issubclass(obj, DataSuite):
+        return None
+    return MypyDataSuite(name, parent=collector)
+
+
+class MypyDataSuite(pytest.Class):
+    def collect(self):
+        for case in self.obj.cases():
+            yield MypyDataCase(case.name, self, case)
+
+
+class MypyDataCase(pytest.Item):
+    def __init__(self, name: str, parent: MypyDataSuite, obj: DataDrivenTestCase) -> None:
+        self.skip = False
+        if name.endswith('-skip'):
+            self.skip = True
+            name = name[:-len('-skip')]
+
+        super().__init__(name, parent)
+        self.obj = obj
+
+    def runtest(self):
+        if self.skip:
+            pytest.skip()
+        update_data = self.config.getoption('--update-data', False)
+        self.parent.obj(update_data=update_data).run_case(self.obj)
+
+    def setup(self):
+        self.obj.set_up()
+
+    def teardown(self):
+        self.obj.tear_down()
+
+    def reportinfo(self):
+        return self.obj.file, self.obj.line, self.obj.name
+
+    def repr_failure(self, excinfo):
+        if excinfo.errisinstance(SystemExit):
+            # We assume that before doing exit() (which raises SystemExit) we've printed
+            # enough context about what happened so that a stack trace is not useful.
+            # In particular, uncaught exceptions during semantic analysis or type checking
+            # call exit() and they already print out a stack trace.
+            excrepr = excinfo.exconly()
+        else:
+            self.parent._prunetraceback(excinfo)
+            excrepr = excinfo.getrepr(style='short')
+
+        return "data: {}:{}\n{}".format(self.obj.file, self.obj.line, excrepr)
+
+
+class DataSuite:
+    @classmethod
+    def cases(cls) -> List[DataDrivenTestCase]:
+        return []
+
+    def run_case(self, testcase: DataDrivenTestCase) -> None:
+        raise NotImplementedError
diff --git a/mypy/test/helpers.py b/mypy/test/helpers.py
@@ -85,9 +85,8 @@ def assert_string_arrays_equal(expected: List[str], actual: List[str],
         raise AssertionFailure(msg)
 
 
-def update_testcase_output(testcase: DataDrivenTestCase, output: List[str], append: str) -> None:
+def update_testcase_output(testcase: DataDrivenTestCase, output: List[str]) -> None:
     testcase_path = os.path.join(testcase.old_cwd, testcase.file)
-    newfile = testcase_path + append
     data_lines = open(testcase_path).read().splitlines()
     test = '\n'.join(data_lines[testcase.line:testcase.lastline])
 
@@ -111,7 +110,7 @@ def update_testcase_output(testcase: DataDrivenTestCase, output: List[str], appe
 
     data_lines[testcase.line:testcase.lastline] = [test]
     data = '\n'.join(data_lines)
-    with open(newfile, 'w') as f:
+    with open(testcase_path, 'w') as f:
         print(data, file=f)
 
 
diff --git a/mypy/test/testcheck.py b/mypy/test/testcheck.py
@@ -9,11 +9,10 @@
 from typing import Tuple, List, Dict, Set
 
 from mypy import build, defaults
-import mypy.myunit  # for mutable globals (ick!)
 from mypy.build import BuildSource, find_module_clear_caches
-from mypy.myunit import Suite, AssertionFailure
+from mypy.myunit import AssertionFailure
 from mypy.test.config import test_temp_dir, test_data_prefix
-from mypy.test.data import parse_test_cases, DataDrivenTestCase
+from mypy.test.data import parse_test_cases, DataDrivenTestCase, DataSuite
 from mypy.test.helpers import (
     assert_string_arrays_equal, normalize_error_messages,
     testcase_pyversion, update_testcase_output,
@@ -67,42 +66,45 @@
 ]
 
 
-class TypeCheckSuite(Suite):
+class TypeCheckSuite(DataSuite):
+    def __init__(self, *, update_data=False):
+        self.update_data = update_data
 
-    def cases(self) -> List[DataDrivenTestCase]:
+    @classmethod
+    def cases(cls) -> List[DataDrivenTestCase]:
         c = []  # type: List[DataDrivenTestCase]
         for f in files:
             c += parse_test_cases(os.path.join(test_data_prefix, f),
-                                  self.run_test, test_temp_dir, True)
+                                  None, test_temp_dir, True)
         return c
 
-    def run_test(self, testcase: DataDrivenTestCase) -> None:
+    def run_case(self, testcase: DataDrivenTestCase) -> None:
         incremental = 'incremental' in testcase.name.lower() or 'incremental' in testcase.file
         optional = 'optional' in testcase.file
         if incremental:
             # Incremental tests are run once with a cold cache, once with a warm cache.
             # Expect success on first run, errors from testcase.output (if any) on second run.
             # We briefly sleep to make sure file timestamps are distinct.
             self.clear_cache()
-            self.run_test_once(testcase, 1)
+            self.run_case_once(testcase, 1)
             time.sleep(0.1)
-            self.run_test_once(testcase, 2)
+            self.run_case_once(testcase, 2)
         elif optional:
             try:
                 experiments.STRICT_OPTIONAL = True
-                self.run_test_once(testcase)
+                self.run_case_once(testcase)
             finally:
                 experiments.STRICT_OPTIONAL = False
         else:
-            self.run_test_once(testcase)
+            self.run_case_once(testcase)
 
     def clear_cache(self) -> None:
         dn = defaults.MYPY_CACHE
 
         if os.path.exists(dn):
             shutil.rmtree(dn)
 
-    def run_test_once(self, testcase: DataDrivenTestCase, incremental=0) -> None:
+    def run_case_once(self, testcase: DataDrivenTestCase, incremental=0) -> None:
         find_module_clear_caches()
         program_text = '\n'.join(testcase.input)
         module_name, program_name, program_text = self.parse_module(program_text)
@@ -140,8 +142,8 @@ def run_test_once(self, testcase: DataDrivenTestCase, incremental=0) -> None:
             a = e.messages
         a = normalize_error_messages(a)
 
-        if output != a and mypy.myunit.UPDATE_TESTCASES:
-            update_testcase_output(testcase, a, mypy.myunit.APPEND_TESTCASES)
+        if output != a and self.update_data:
+            update_testcase_output(testcase, a)
 
         assert_string_arrays_equal(
             output, a,
diff --git a/mypy/test/update.py b/mypy/test/update.py
diff --git a/mypy/waiter.py b/mypy/waiter.py
@@ -281,6 +281,18 @@ def parse_test_stats_from_output(output: str, fail_type: Optional[str]) -> Tuple
     Return tuple (number of tests, number of test failures). Default
     to the entire task representing a single test as a fallback.
     """
+
+    # pytest
+    m = re.search('^=+ (.*) in [0-9.]+ seconds =+\n\Z', output, re.MULTILINE)
+    if m:
+        counts = {}
+        for part in m.group(1).split(', '):  # e.g., '3 failed, 32 passed, 345 deselected'
+            count, key = part.split()
+            counts[key] = int(count)
+        return (sum(c for k, c in counts.items() if k != 'deselected'),
+                counts.get('failed', 0))
+
+    # myunit
     m = re.search('^([0-9]+)/([0-9]+) test cases failed(, ([0-9]+) skipped)?.$', output,
                   re.MULTILINE)
     if m:
@@ -289,6 +301,7 @@ def parse_test_stats_from_output(output: str, fail_type: Optional[str]) -> Tuple
                   re.MULTILINE)
     if m:
         return int(m.group(1)), 0
+
     # Couldn't find test counts, so fall back to single test per tasks.
     if fail_type is not None:
         return 1, 1
diff --git a/pytest.ini b/pytest.ini
@@ -0,0 +1,11 @@
+[pytest]
+# testpaths is new in 2.8
+minversion = 2.8
+
+testpaths = mypy/test
+
+python_files = test*.py
+
+# empty patterns for default python collector, to stick to our plugin's collector
+python_classes =
+python_functions =
diff --git a/runtests.py b/runtests.py
@@ -93,6 +93,13 @@ def add_mypy_package(self, name: str, packagename: str) -> None:
     def add_mypy_string(self, name: str, *args: str, cwd: Optional[str] = None) -> None:
         self.add_mypy_cmd(name, ['-c'] + list(args), cwd=cwd)
 
+    def add_pytest(self, name: str, pytest_args: List[str]) -> None:
+        full_name = 'pytest %s' % name
+        if not self.allow(full_name):
+            return
+        args = [sys.executable, '-m', 'pytest'] + pytest_args
+        self.waiter.add(LazySubprocess(full_name, args, env=self.env))
+
     def add_python(self, name: str, *args: str, cwd: Optional[str] = None) -> None:
         name = 'run %s' % name
         if not self.allow(name):
@@ -187,6 +194,16 @@ def add_imports(driver: Driver) -> None:
         driver.add_flake8('module %s' % mod, f)
 
 
+PYTEST_FILES = ['mypy/test/{}.py'.format(name) for name in [
+    'testcheck',
+]]
+
+
+def add_pytest(driver: Driver) -> None:
+    for f in PYTEST_FILES:
+        driver.add_pytest(f, [f] + driver.arglist)
+
+
 def add_myunit(driver: Driver) -> None:
     for f in find_files('mypy', prefix='test', suffix='.py'):
         mod = file_to_module(f)
@@ -199,6 +216,9 @@ def add_myunit(driver: Driver) -> None:
             # parsing tests separately since they are much slower than
             # proper unit tests.
             pass
+        elif f in PYTEST_FILES:
+            # This module has been converted to pytest; don't try to use myunit.
+            pass
         else:
             driver.add_python_mod('unit-test %s' % mod, 'mypy.myunit', '-m', mod, *driver.arglist)
 
@@ -362,6 +382,7 @@ def main() -> None:
     add_cmdline(driver)
     add_basic(driver)
     add_selftypecheck(driver)
+    add_pytest(driver)
     add_myunit(driver)
     add_imports(driver)
     add_stubs(driver)
diff --git a/test-requirements.txt b/test-requirements.txt
@@ -1,2 +1,3 @@
 flake8
 typed-ast
+pytest>=2.8

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+pytest_plugins = [`
	`2`	`+ 'mypy.test.data',`
	`3`	`+]`
Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,3 @@`
`1`	`1`	`flake8`
`2`	`2`	`typed-ast`
	`3`	`+pytest>=2.8`