Skip to content

Commit ff2523a

Browse files
author
Roman Inflianskas
committed
RFC: Add MypyTypeInferenceProvider
This change is RFC (please read whole change message). Add `MypyTypeInferenceProvider` as an alternative for `TypeInferenceProvider`. The provider infers types using mypy as library. The only requirement for the usage is to have the latest mypy installed. Types inferred are mypy types, since mypy type system is well designed, to avoid the conversion, and also to keep it simple. For compatibility and extensibility reasons, these types are stored in separate field `MypyType.mypy_type`. Let's assume we have the following code in the file `x.py` which we want to inspect: ```python x = [42] s = set() from enum import Enum class E(Enum): f = "f" e = E.f ``` Then to get play with mypy types one should use the code like: ```python import libcst as cst from libcst.metadata import MypyTypeInferenceProvider filename = "x.py" module = cst.parse_module(open(filename).read()) cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename] wrapper = cst.MetadataWrapper( module, cache={MypyTypeInferenceProvider: cache}, ) mypy_type = wrapper.resolve(MypyTypeInferenceProvider) x_name_node = wrapper.module.body[0].body[0].targets[0].target set_call_node = wrapper.module.body[1].body[0].value e_name_node = wrapper.module.body[-1].body[0].targets[0].target print(mypy_type[x_name_node]) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].fullname) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].mypy_type.type.fullname) # prints: builtins.list print(mypy_type[x_name_node].mypy_type.args) # prints: (builtins.int,) print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname) # prints: typing.MutableSequence print(mypy_type[set_call_node]) # prints: builtins.set print("issuperset" in mypy_type[set_call_node].mypy_type.names) # prints: True print(mypy_type[set_call_node.func]) # prints: typing.Type[builtins.set] print(mypy_type[e_name_node].mypy_type.type.is_enum) # prints: True ``` Why? 1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be installed. mypy is more popular than pyre. If the organization uses mypy already (which is almost always the case), it may be difficult to assure colleagues (including security team) that "we need yet another type checker". `MypyTypeInferenceProvider` requires the latest mypy only. 2. Even though it is possible to run pyre without watchman installation, this is not advertised. watchman installation is not always possible because of system requirements, or because of the security requirements like "we install only our favorite GNU/Linux distribution packages". 3. `TypeInferenceProvider` usage requires `pyre start` command to be run before the execution, and `pyre stop` - after the execution. This may be inconvenient, especially for the cases when pyre was not used before. 4. Types produced by pyre in `TypeInferenceProvider` are just strings. For example, it's not easily possible to infer that some variable is enum instance. `MypyTypeInferenceProvider` makes it easy, see the code above. Drawback: 1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider` comparing to `TypeInferenceProvider`. How to partially solve this: 1. Implement AST tree caching in mypy. It may be difficult, however this will lead to speed improvements for all the projects that use this functionality. 2. Implement inferred types caching inside LibCST. As far as I know, no caching at all is implemented inside LibCST, which is the prerequisite for inferred types caching, so the task is big. 3. Implement LibCST CST to mypy AST. I am not sure if this possible at all. Even if it is possible, the task is huge. 2. Two providers are doing similar things in LibCST will be present, this can potentially lead to the situation when there is a need install two typecheckers to get all codemods from the library running. Alternatives considered: 1. Put `MypyTypeInferenceProvider` inside separate library (say, LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly separate `MypyTypeInferenceProvider` from the rest of LibCST. Drawbacks: 1. The need to maintain separate library. 2. Limited fame (people need to know that the library exists). 3. Since some codemods cannot be implemented easily without the library, for example, `if-elif-else` to `match` converter (it needs powerful type inference), they are doomed to not be shipped with LibCST, which makes the latter less attractive for end users. 2. Implement base class for inferred type, which inherits from `str` (to keep the compatibility with the existing codebase) and the mechanism for dynamically selecting `TypeInferenceProvider` typechecker (mypy or pyre; user can do this via enviromental variable). If the code inside LibCST requires just shallow type information (so, just `str` is enough), then the code can run with any typechecker. The remaining code (such as `if-elif-else` to `match` converter) will still require mypy. Misc: Code does not lint in my env, by some reason `pyre check` cannot find `mypy` library. Related to: * Instagram#451 * pyastrx/pyastrx#40 * python/mypy#12513 * python/mypy#4868
1 parent f668e88 commit ff2523a

File tree

5 files changed

+309
-0
lines changed

5 files changed

+309
-0
lines changed

libcst/metadata/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
ExpressionContextProvider,
1818
)
1919
from libcst.metadata.full_repo_manager import FullRepoManager
20+
from libcst.metadata.mypy_type_inference_provider import MypyTypeInferenceProvider
2021
from libcst.metadata.name_provider import (
2122
FullyQualifiedNameProvider,
2223
QualifiedNameProvider,
@@ -74,6 +75,7 @@
7475
"ClassScope",
7576
"ComprehensionScope",
7677
"ScopeProvider",
78+
"MypyTypeInferenceProvider",
7779
"ParentNodeProvider",
7880
"QualifiedName",
7981
"QualifiedNameSource",
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
#
3+
# This source code is licensed under the MIT license found in the
4+
# LICENSE file in the root directory of this source tree.
5+
6+
from pathlib import Path
7+
from typing import Dict, List, Mapping, Optional, TYPE_CHECKING
8+
9+
import libcst as cst
10+
from libcst._position import CodeRange
11+
from libcst.helpers import calculate_module_and_package
12+
from libcst.metadata.base_provider import BatchableMetadataProvider
13+
from libcst.metadata.position_provider import PositionProvider
14+
15+
try:
16+
import mypy
17+
18+
MYPY_INSTALLED = True
19+
except ImportError:
20+
MYPY_INSTALLED = False
21+
22+
23+
if TYPE_CHECKING:
24+
import mypy.nodes
25+
26+
import libcst.metadata.mypy_utils
27+
28+
29+
def raise_on_mypy_non_installed() -> None:
30+
if not MYPY_INSTALLED:
31+
raise RuntimeError("mypy is not installed, please install it")
32+
33+
34+
class MypyTypeInferenceProvider(
35+
BatchableMetadataProvider["libcst.metadata.mypy_utils.MypyType"]
36+
):
37+
"""
38+
Access inferred type annotation through `mypy <http://mypy-lang.org/>`_.
39+
"""
40+
41+
METADATA_DEPENDENCIES = (PositionProvider,)
42+
43+
@classmethod
44+
def gen_cache(
45+
cls, root_path: Path, paths: List[str], timeout: Optional[int] = None
46+
) -> Mapping[
47+
str, Optional["libcst.metadata.mypy_utils.MypyTypeInferenceProviderCache"]
48+
]:
49+
raise_on_mypy_non_installed()
50+
51+
import mypy.build
52+
import mypy.main
53+
54+
from libcst.metadata.mypy_utils import MypyTypeInferenceProviderCache
55+
56+
targets, options = mypy.main.process_options(paths)
57+
options.preserve_asts = True
58+
options.fine_grained_incremental = True
59+
options.use_fine_grained_cache = True
60+
mypy_result = mypy.build.build(targets, options=options)
61+
cache = {}
62+
for path in paths:
63+
module = calculate_module_and_package(str(root_path), path).name
64+
cache[path] = MypyTypeInferenceProviderCache(
65+
module_name=module,
66+
mypy_file=mypy_result.graph[module].tree,
67+
)
68+
return cache
69+
70+
def __init__(
71+
self,
72+
cache: Optional["libcst.metadata.mypy_utils.MypyTypeInferenceProviderCache"],
73+
) -> None:
74+
from libcst.metadata.mypy_utils import CodeRangeToMypyNodesBinder
75+
76+
super().__init__(cache)
77+
self._mypy_node_locations: Dict[CodeRange, "mypy.nodes.Node"] = {}
78+
if cache is None:
79+
return
80+
code_range_to_mypy_nodes_binder = CodeRangeToMypyNodesBinder(cache.module_name)
81+
code_range_to_mypy_nodes_binder.visit_mypy_file(cache.mypy_file)
82+
self._mypy_node_locations = code_range_to_mypy_nodes_binder.locations
83+
84+
def _parse_metadata(self, node: cst.CSTNode) -> None:
85+
range = self.get_metadata(PositionProvider, node)
86+
if range in self._mypy_node_locations:
87+
self.set_metadata(node, self._mypy_node_locations.get(range))
88+
89+
def visit_Name(self, node: cst.Name) -> Optional[bool]:
90+
self._parse_metadata(node)
91+
92+
def visit_Attribute(self, node: cst.Attribute) -> Optional[bool]:
93+
self._parse_metadata(node)
94+
95+
def visit_Call(self, node: cst.Call) -> Optional[bool]:
96+
self._parse_metadata(node)

libcst/metadata/mypy_utils.py

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
#
3+
# This source code is licensed under the MIT license found in the
4+
# LICENSE file in the root directory of this source tree.
5+
from dataclasses import dataclass, field
6+
from typing import Dict, Optional, Union
7+
8+
import mypy.build
9+
import mypy.main
10+
import mypy.modulefinder
11+
import mypy.nodes
12+
import mypy.options
13+
import mypy.patterns
14+
import mypy.traverser
15+
import mypy.types
16+
import mypy.typetraverser
17+
18+
from libcst._add_slots import add_slots
19+
from libcst._position import CodePosition, CodeRange
20+
21+
22+
@add_slots
23+
@dataclass(frozen=True)
24+
class MypyTypeInferenceProviderCache:
25+
module_name: str
26+
mypy_file: mypy.nodes.MypyFile
27+
28+
29+
@add_slots
30+
@dataclass(frozen=True)
31+
class MypyType:
32+
is_type_constructor: bool
33+
mypy_type: Optional[Union[mypy.types.Type, mypy.nodes.TypeInfo]] = None
34+
fullname: str = field(init=False)
35+
36+
def __post_init__(self) -> None:
37+
if isinstance(self.mypy_type, mypy.types.Type):
38+
fullname = str(self.mypy_type)
39+
else:
40+
fullname = self.mypy_type.fullname
41+
if self.is_type_constructor:
42+
fullname = f"typing.Type[{fullname}]"
43+
object.__setattr__(self, "fullname", fullname)
44+
45+
def __str__(self) -> str:
46+
return self.fullname
47+
48+
49+
class CodeRangeToMypyNodesBinder(
50+
mypy.traverser.TraverserVisitor, mypy.typetraverser.TypeTraverserVisitor
51+
):
52+
def __init__(self, module_name: str) -> None:
53+
super().__init__()
54+
self.locations: Dict[CodeRange, MypyType] = {}
55+
self.in_type_alias_expr = False
56+
self.module_name = module_name
57+
58+
# Helpers
59+
60+
@staticmethod
61+
def get_code_range(o: mypy.nodes.Context) -> CodeRange:
62+
return CodeRange(
63+
start=CodePosition(o.line, o.column),
64+
end=CodePosition(o.end_line, o.end_column),
65+
)
66+
67+
@staticmethod
68+
def check_bounds(o: mypy.nodes.Context) -> bool:
69+
return (
70+
(o.line is not None)
71+
and (o.line >= 1)
72+
and (o.column is not None)
73+
and (o.column >= 0)
74+
and (o.end_line is not None)
75+
and (o.end_line >= 1)
76+
and (o.end_column is not None)
77+
and (o.end_column >= 0)
78+
)
79+
80+
def record_type_location_using_code_range(
81+
self,
82+
code_range: CodeRange,
83+
t: Optional[Union[mypy.types.Type, mypy.nodes.TypeInfo]],
84+
is_type_constructor: bool,
85+
) -> None:
86+
if t is not None:
87+
self.locations[code_range] = MypyType(
88+
is_type_constructor=is_type_constructor, mypy_type=t
89+
)
90+
91+
def record_type_location(
92+
self,
93+
o: mypy.nodes.Context,
94+
t: Optional[Union[mypy.types.Type, mypy.nodes.TypeInfo]],
95+
is_type_constructor: bool,
96+
) -> None:
97+
if self.check_bounds(o):
98+
self.record_type_location_using_code_range(
99+
code_range=self.get_code_range(o),
100+
t=t,
101+
is_type_constructor=is_type_constructor,
102+
)
103+
104+
def record_location_by_name_expr(
105+
self, code_range: CodeRange, o: mypy.nodes.NameExpr, is_type_constructor: bool
106+
) -> None:
107+
if isinstance(o.node, mypy.nodes.Var):
108+
self.record_type_location_using_code_range(
109+
code_range=code_range, t=o.node.type, is_type_constructor=False
110+
)
111+
elif isinstance(o.node, mypy.nodes.TypeInfo):
112+
self.record_type_location_using_code_range(
113+
code_range=code_range, t=o.node, is_type_constructor=is_type_constructor
114+
)
115+
116+
# Actual visitors
117+
118+
def visit_var(self, o: mypy.nodes.Var) -> None:
119+
super().visit_var(o)
120+
self.record_type_location(o=o, t=o.type, is_type_constructor=False)
121+
122+
def visit_name_expr(self, o: mypy.nodes.NameExpr) -> None:
123+
super().visit_name_expr(o)
124+
# Implementation in base classes is omitted, record it if it is variable or class
125+
self.record_location_by_name_expr(
126+
self.get_code_range(o), o, is_type_constructor=True
127+
)
128+
129+
def visit_member_expr(self, o: mypy.nodes.MemberExpr) -> None:
130+
super().visit_member_expr(o)
131+
# Implementation in base classes is omitted, record it
132+
# o.def_var should not be None after mypy run, checking here just to be sure
133+
if o.def_var is not None:
134+
self.record_type_location(o=o, t=o.def_var.type, is_type_constructor=False)
135+
136+
def visit_call_expr(self, o: mypy.nodes.CallExpr) -> None:
137+
super().visit_call_expr(o)
138+
if isinstance(o.callee, mypy.nodes.NameExpr):
139+
self.record_location_by_name_expr(
140+
code_range=self.get_code_range(o), o=o.callee, is_type_constructor=False
141+
)
142+
143+
def visit_instance(self, o: mypy.types.Instance) -> None:
144+
super().visit_instance(o)
145+
self.record_type_location(o=o, t=o, is_type_constructor=False)
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
#
3+
# This source code is licensed under the MIT license found in the
4+
# LICENSE file in the root directory of this source tree.
5+
6+
import sys
7+
from pathlib import Path
8+
from unittest import skipIf
9+
10+
import libcst as cst
11+
from libcst import MetadataWrapper
12+
from libcst.metadata.mypy_type_inference_provider import MypyTypeInferenceProvider
13+
from libcst.testing.utils import data_provider, UnitTest
14+
from libcst.tests.test_pyre_integration import TEST_SUITE_PATH
15+
16+
17+
def _test_simple_class_helper(test: UnitTest, wrapper: MetadataWrapper) -> None:
18+
mypy_nodes = wrapper.resolve(MypyTypeInferenceProvider)
19+
m = wrapper.module
20+
assign = cst.ensure_type(
21+
cst.ensure_type(
22+
cst.ensure_type(
23+
cst.ensure_type(m.body[1].body, cst.IndentedBlock).body[0],
24+
cst.FunctionDef,
25+
).body.body[0],
26+
cst.SimpleStatementLine,
27+
).body[0],
28+
cst.AnnAssign,
29+
)
30+
self_number_attr = cst.ensure_type(assign.target, cst.Attribute)
31+
test.assertEqual(str(mypy_nodes[self_number_attr]), "builtins.int")
32+
33+
# self
34+
test.assertEqual(
35+
str(mypy_nodes[self_number_attr.value]), "libcst.tests.pyre.simple_class.Item"
36+
)
37+
collector_assign = cst.ensure_type(
38+
cst.ensure_type(m.body[3], cst.SimpleStatementLine).body[0], cst.Assign
39+
)
40+
collector = collector_assign.targets[0].target
41+
test.assertEqual(
42+
str(mypy_nodes[collector]), "libcst.tests.pyre.simple_class.ItemCollector"
43+
)
44+
items_assign = cst.ensure_type(
45+
cst.ensure_type(m.body[4], cst.SimpleStatementLine).body[0], cst.AnnAssign
46+
)
47+
items = items_assign.target
48+
test.assertEqual(
49+
str(mypy_nodes[items]), "typing.Sequence[libcst.tests.pyre.simple_class.Item]"
50+
)
51+
52+
53+
class MypyTypeInferenceProviderTest(UnitTest):
54+
@data_provider(
55+
((TEST_SUITE_PATH / "simple_class.py", TEST_SUITE_PATH / "simple_class.json"),)
56+
)
57+
def test_simple_class_types(self, source_path: Path, data_path: Path) -> None:
58+
file = str(source_path)
59+
repo_root = Path(__file__).parents[3]
60+
cache = MypyTypeInferenceProvider.gen_cache(repo_root, [file])
61+
wrapper = MetadataWrapper(
62+
cst.parse_module(source_path.read_text()),
63+
cache={MypyTypeInferenceProvider: cache[file]},
64+
)
65+
_test_simple_class_helper(self, wrapper)

requirements-dev.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ hypothesis>=4.36.0
77
hypothesmith>=0.0.4
88
jupyter>=1.0.0
99
maturin>=0.8.3,<0.14
10+
mypy>=0.991
1011
nbsphinx>=0.4.2
1112
prompt-toolkit>=2.0.9
1213
pyre-check==0.9.9; platform_system != "Windows"

0 commit comments

Comments
 (0)