-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Plugin: Use mypy to enrich AST with types #12513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems related to #4868
I think so. However, i don't know yet if this is trivial. I'm trying to figure out how to do it.
There is no documentation to do it. But I believe this is not a good approach.
Mypy builds a tree with type information. But I don't know all the details
I don't know all the details. You should look into )
import mypy.main as MAIN
import mypy.build as BUILD
py_file = "src/mod/test"
mod = py_file.replace("/", ".")
files, opt = MAIN.process_options([f"{py_file}.py"])
opt.preserve_asts = True
opt.fine_grained_incremental = True
result = BUILD.build(files, options=opt)
print(result.graph[mod].tree.__str__()) |
@jepperaskdk I'm trying to discover a way to feed a XML representing the python AST with the mypy type information. It seems that is not so trivial. I have the following hypothesis or doubts
Thus, this turns enriching the AST with types a hard task. If you want to keep the same original structure
In the python ast module we can do the following tree = ast.parse(txt, filename)
def transform(tree):
node_fields = zip(
node._fields, (getattr(node, attr) for attr in tree._fields))
for field_name, field_value in node_fields:
# stuff and recursion I don't know how we can have a similar implementation using the result from mypy.build If you want to attack this problem we can talk more. Maybe we can figure out how to achieve this |
Strange, it seems that each cache for each file dosen't store any information about the types related with the variables inside each function. |
This change is RFC (please read whole change message). Add `MypyTypeInferenceProvider` as an alternative for `TypeInferenceProvider`. The provider infers types using mypy as library. The only requirement for the usage is to have the latest mypy installed. Types inferred are mypy types, since mypy type system is well designed, to avoid the conversion, and also to keep it simple. For compatibility and extensibility reasons, these types are stored in separate field `MypyType.mypy_type`. Let's assume we have the following code in the file `x.py` which we want to inspect: ```python x = [42] s = set() from enum import Enum class E(Enum): f = "f" e = E.f ``` Then to get play with mypy types one should use the code like: ```python import libcst as cst from libcst.metadata import MypyTypeInferenceProvider filename = "x.py" module = cst.parse_module(open(filename).read()) cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename] wrapper = cst.MetadataWrapper( module, cache={MypyTypeInferenceProvider: cache}, ) mypy_type = wrapper.resolve(MypyTypeInferenceProvider) x_name_node = wrapper.module.body[0].body[0].targets[0].target set_call_node = wrapper.module.body[1].body[0].value e_name_node = wrapper.module.body[-1].body[0].targets[0].target print(mypy_type[x_name_node]) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].fullname) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].mypy_type.type.fullname) # prints: builtins.list print(mypy_type[x_name_node].mypy_type.args) # prints: (builtins.int,) print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname) # prints: typing.MutableSequence print(mypy_type[set_call_node]) # prints: builtins.set print("issuperset" in mypy_type[set_call_node].mypy_type.names) # prints: True print(mypy_type[set_call_node.func]) # prints: typing.Type[builtins.set] print(mypy_type[e_name_node].mypy_type.type.is_enum) # prints: True ``` Why? 1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be installed. mypy is more popular than pyre. If the organization uses mypy already (which is almost always the case), it may be difficult to assure collegues (including security team) that "we need yet another type checker". `MypyTypeInferenceProvider` requires the latest mypy only. 2. Even though it is possible to run pyre without watchman installation, this is not advertised. watchman installation is not always possible because of system requirements, or because of the security requirements like "we install only our favorite GNU/Linux distribution packages". 3. `TypeInferenceProvider` usage requires `pyre start` command to be run before the execution, and `pyre stop` - after the execution. This may be inconvenient, especially for the cases when pyre was not used before. 4. Types produced by pyre in `TypeInferenceProvider` are just strings. For example, it's not easily possible to infer that some variable is enum instance. `MypyTypeInferenceProvider` makes it easy: ``` [FIXME: code here] ``` Drawback: 1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider` comparing to `TypeInferenceProvider`. How to partially solve this: 1. Implement AST tree caching in mypy. It may be difficult, however this will lead to speed improvements for all the projects that use this functionality. 2. Implement inferred types caching inside LibCST. As far as I know, no caching at all is implemented inside LibCST, which is the prerequisite for inferred types caching, so the task is big. 3. Implement LibCST CST to mypy AST. I am not sure if this possible at all. Even if it is possible, the task is huge. 2. Two providers are doing similar things in LibCST will be present, this can potentially lead to the situation when there is a need install two typecheckers to get all codemods from the library running. Alternatives considered: 1. Put `MypyTypeInferenceProvider` inside separate library (say, LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly separate `MypyTypeInferenceProvider` from the rest of LibCST. Drawbacks: 1. The need to maintain separate library. 2. Limited fame (people need to know that the library exists). 3. Since some codemods cannot be implemented easily without the library, for example, `if-elif-else` to `match` converter (it needs powerful type inference), they are doomed to not be shipped with LibCST, which makes the latter less attractive for end users. 2. Implement base class for inferred type, which inherits from `str` (to keep the compatibility with the existing codebase) and the mechanism for dynamically selecting `TypeInferenceProvider` typechecker (mypy or pyre; user can do this via enviromental variable). If the code inside LibCST requires just shallow type information (so, just `str` is enough), then the code can run with any typechecker. Ther remaining code (such as `if-elif-else` to `match` converter) will still require mypy. Misc: Code does not lint in my env, by some reason `pyre check` cannot find `mypy` library. Related to: * Instagram#451 * pyastrx/pyastrx#40 * python/mypy#12513 * python/mypy#4868
This change is RFC (please read whole change message). Add `MypyTypeInferenceProvider` as an alternative for `TypeInferenceProvider`. The provider infers types using mypy as library. The only requirement for the usage is to have the latest mypy installed. Types inferred are mypy types, since mypy type system is well designed, to avoid the conversion, and also to keep it simple. For compatibility and extensibility reasons, these types are stored in separate field `MypyType.mypy_type`. Let's assume we have the following code in the file `x.py` which we want to inspect: ```python x = [42] s = set() from enum import Enum class E(Enum): f = "f" e = E.f ``` Then to get play with mypy types one should use the code like: ```python import libcst as cst from libcst.metadata import MypyTypeInferenceProvider filename = "x.py" module = cst.parse_module(open(filename).read()) cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename] wrapper = cst.MetadataWrapper( module, cache={MypyTypeInferenceProvider: cache}, ) mypy_type = wrapper.resolve(MypyTypeInferenceProvider) x_name_node = wrapper.module.body[0].body[0].targets[0].target set_call_node = wrapper.module.body[1].body[0].value e_name_node = wrapper.module.body[-1].body[0].targets[0].target print(mypy_type[x_name_node]) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].fullname) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].mypy_type.type.fullname) # prints: builtins.list print(mypy_type[x_name_node].mypy_type.args) # prints: (builtins.int,) print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname) # prints: typing.MutableSequence print(mypy_type[set_call_node]) # prints: builtins.set print("issuperset" in mypy_type[set_call_node].mypy_type.names) # prints: True print(mypy_type[set_call_node.func]) # prints: typing.Type[builtins.set] print(mypy_type[e_name_node].mypy_type.type.is_enum) # prints: True ``` Why? 1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be installed. mypy is more popular than pyre. If the organization uses mypy already (which is almost always the case), it may be difficult to assure collegues (including security team) that "we need yet another type checker". `MypyTypeInferenceProvider` requires the latest mypy only. 2. Even though it is possible to run pyre without watchman installation, this is not advertised. watchman installation is not always possible because of system requirements, or because of the security requirements like "we install only our favorite GNU/Linux distribution packages". 3. `TypeInferenceProvider` usage requires `pyre start` command to be run before the execution, and `pyre stop` - after the execution. This may be inconvenient, especially for the cases when pyre was not used before. 4. Types produced by pyre in `TypeInferenceProvider` are just strings. For example, it's not easily possible to infer that some variable is enum instance. `MypyTypeInferenceProvider` makes it easy, see the code above. Drawback: 1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider` comparing to `TypeInferenceProvider`. How to partially solve this: 1. Implement AST tree caching in mypy. It may be difficult, however this will lead to speed improvements for all the projects that use this functionality. 2. Implement inferred types caching inside LibCST. As far as I know, no caching at all is implemented inside LibCST, which is the prerequisite for inferred types caching, so the task is big. 3. Implement LibCST CST to mypy AST. I am not sure if this possible at all. Even if it is possible, the task is huge. 2. Two providers are doing similar things in LibCST will be present, this can potentially lead to the situation when there is a need install two typecheckers to get all codemods from the library running. Alternatives considered: 1. Put `MypyTypeInferenceProvider` inside separate library (say, LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly separate `MypyTypeInferenceProvider` from the rest of LibCST. Drawbacks: 1. The need to maintain separate library. 2. Limited fame (people need to know that the library exists). 3. Since some codemods cannot be implemented easily without the library, for example, `if-elif-else` to `match` converter (it needs powerful type inference), they are doomed to not be shipped with LibCST, which makes the latter less attractive for end users. 2. Implement base class for inferred type, which inherits from `str` (to keep the compatibility with the existing codebase) and the mechanism for dynamically selecting `TypeInferenceProvider` typechecker (mypy or pyre; user can do this via enviromental variable). If the code inside LibCST requires just shallow type information (so, just `str` is enough), then the code can run with any typechecker. Ther remaining code (such as `if-elif-else` to `match` converter) will still require mypy. Misc: Code does not lint in my env, by some reason `pyre check` cannot find `mypy` library. Related to: * Instagram#451 * pyastrx/pyastrx#40 * python/mypy#12513 * python/mypy#4868
This change is RFC (please read whole change message). Add `MypyTypeInferenceProvider` as an alternative for `TypeInferenceProvider`. The provider infers types using mypy as library. The only requirement for the usage is to have the latest mypy installed. Types inferred are mypy types, since mypy type system is well designed, to avoid the conversion, and also to keep it simple. For compatibility and extensibility reasons, these types are stored in separate field `MypyType.mypy_type`. Let's assume we have the following code in the file `x.py` which we want to inspect: ```python x = [42] s = set() from enum import Enum class E(Enum): f = "f" e = E.f ``` Then to get play with mypy types one should use the code like: ```python import libcst as cst from libcst.metadata import MypyTypeInferenceProvider filename = "x.py" module = cst.parse_module(open(filename).read()) cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename] wrapper = cst.MetadataWrapper( module, cache={MypyTypeInferenceProvider: cache}, ) mypy_type = wrapper.resolve(MypyTypeInferenceProvider) x_name_node = wrapper.module.body[0].body[0].targets[0].target set_call_node = wrapper.module.body[1].body[0].value e_name_node = wrapper.module.body[-1].body[0].targets[0].target print(mypy_type[x_name_node]) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].fullname) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].mypy_type.type.fullname) # prints: builtins.list print(mypy_type[x_name_node].mypy_type.args) # prints: (builtins.int,) print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname) # prints: typing.MutableSequence print(mypy_type[set_call_node]) # prints: builtins.set print("issuperset" in mypy_type[set_call_node].mypy_type.names) # prints: True print(mypy_type[set_call_node.func]) # prints: typing.Type[builtins.set] print(mypy_type[e_name_node].mypy_type.type.is_enum) # prints: True ``` Why? 1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be installed. mypy is more popular than pyre. If the organization uses mypy already (which is almost always the case), it may be difficult to assure colleagues (including security team) that "we need yet another type checker". `MypyTypeInferenceProvider` requires the latest mypy only. 2. Even though it is possible to run pyre without watchman installation, this is not advertised. watchman installation is not always possible because of system requirements, or because of the security requirements like "we install only our favorite GNU/Linux distribution packages". 3. `TypeInferenceProvider` usage requires `pyre start` command to be run before the execution, and `pyre stop` - after the execution. This may be inconvenient, especially for the cases when pyre was not used before. 4. Types produced by pyre in `TypeInferenceProvider` are just strings. For example, it's not easily possible to infer that some variable is enum instance. `MypyTypeInferenceProvider` makes it easy, see the code above. Drawback: 1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider` comparing to `TypeInferenceProvider`. How to partially solve this: 1. Implement AST tree caching in mypy. It may be difficult, however this will lead to speed improvements for all the projects that use this functionality. 2. Implement inferred types caching inside LibCST. As far as I know, no caching at all is implemented inside LibCST, which is the prerequisite for inferred types caching, so the task is big. 3. Implement LibCST CST to mypy AST. I am not sure if this possible at all. Even if it is possible, the task is huge. 2. Two providers are doing similar things in LibCST will be present, this can potentially lead to the situation when there is a need install two typecheckers to get all codemods from the library running. Alternatives considered: 1. Put `MypyTypeInferenceProvider` inside separate library (say, LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly separate `MypyTypeInferenceProvider` from the rest of LibCST. Drawbacks: 1. The need to maintain separate library. 2. Limited fame (people need to know that the library exists). 3. Since some codemods cannot be implemented easily without the library, for example, `if-elif-else` to `match` converter (it needs powerful type inference), they are doomed to not be shipped with LibCST, which makes the latter less attractive for end users. 2. Implement base class for inferred type, which inherits from `str` (to keep the compatibility with the existing codebase) and the mechanism for dynamically selecting `TypeInferenceProvider` typechecker (mypy or pyre; user can do this via enviromental variable). If the code inside LibCST requires just shallow type information (so, just `str` is enough), then the code can run with any typechecker. The remaining code (such as `if-elif-else` to `match` converter) will still require mypy. Misc: Code does not lint in my env, by some reason `pyre check` cannot find `mypy` library. Related to: * Instagram#451 * pyastrx/pyastrx#40 * python/mypy#12513 * python/mypy#4868
This change is RFC (please read whole change message). Add `MypyTypeInferenceProvider` as an alternative for `TypeInferenceProvider`. The provider infers types using mypy as library. The only requirement for the usage is to have the latest mypy installed. Types inferred are mypy types, since mypy type system is well designed, to avoid the conversion, and also to keep it simple. For compatibility and extensibility reasons, these types are stored in separate field `MypyType.mypy_type`. Let's assume we have the following code in the file `x.py` which we want to inspect: ```python x = [42] s = set() from enum import Enum class E(Enum): f = "f" e = E.f ``` Then to get play with mypy types one should use the code like: ```python import libcst as cst from libcst.metadata import MypyTypeInferenceProvider filename = "x.py" module = cst.parse_module(open(filename).read()) cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename] wrapper = cst.MetadataWrapper( module, cache={MypyTypeInferenceProvider: cache}, ) mypy_type = wrapper.resolve(MypyTypeInferenceProvider) x_name_node = wrapper.module.body[0].body[0].targets[0].target set_call_node = wrapper.module.body[1].body[0].value e_name_node = wrapper.module.body[-1].body[0].targets[0].target print(mypy_type[x_name_node]) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].fullname) # prints: builtins.list[builtins.int] print(mypy_type[x_name_node].mypy_type.type.fullname) # prints: builtins.list print(mypy_type[x_name_node].mypy_type.args) # prints: (builtins.int,) print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname) # prints: typing.MutableSequence print(mypy_type[set_call_node]) # prints: builtins.set print("issuperset" in mypy_type[set_call_node].mypy_type.names) # prints: True print(mypy_type[set_call_node.func]) # prints: typing.Type[builtins.set] print(mypy_type[e_name_node].mypy_type.type.is_enum) # prints: True ``` Why? 1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be installed. mypy is more popular than pyre. If the organization uses mypy already (which is almost always the case), it may be difficult to assure colleagues (including security team) that "we need yet another type checker". `MypyTypeInferenceProvider` requires the latest mypy only. 2. Even though it is possible to run pyre without watchman installation, this is not advertised. watchman installation is not always possible because of system requirements, or because of the security requirements like "we install only our favorite GNU/Linux distribution packages". 3. `TypeInferenceProvider` usage requires `pyre start` command to be run before the execution, and `pyre stop` - after the execution. This may be inconvenient, especially for the cases when pyre was not used before. 4. Types produced by pyre in `TypeInferenceProvider` are just strings. For example, it's not easily possible to infer that some variable is enum instance. `MypyTypeInferenceProvider` makes it easy, see the code above. Drawbacks: 1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider` comparing to `TypeInferenceProvider`. How to partially solve this: 1. Implement AST tree caching in mypy. It may be difficult, however this will lead to speed improvements for all the projects that use this functionality. 2. Implement inferred types caching inside LibCST. As far as I know, no caching at all is implemented inside LibCST, which is the prerequisite for inferred types caching, so the task is big. 3. Implement LibCST CST to mypy AST. I am not sure if this possible at all. Even if it is possible, the task is huge. 2. Two providers are doing similar things in LibCST will be present, this can potentially lead to the situation when there is a need install two typecheckers to get all codemods from the library running. Alternatives considered: 1. Put `MypyTypeInferenceProvider` inside separate library (say, LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly separate `MypyTypeInferenceProvider` from the rest of LibCST. Drawbacks: 1. The need to maintain separate library. 2. Limited fame (people need to know that the library exists). 3. Since some codemods cannot be implemented easily without the library, for example, `if-elif-else` to `match` converter (it needs powerful type inference), they are doomed to not be shipped with LibCST, which makes the latter less attractive for end users. 2. Implement base class for inferred type, which inherits from `str` (to keep the compatibility with the existing codebase) and the mechanism for dynamically selecting `TypeInferenceProvider` typechecker (mypy or pyre; user can do this via enviromental variable). If the code inside LibCST requires just shallow type information (so, just `str` is enough), then the code can run with any typechecker. The remaining code (such as `if-elif-else` to `match` converter) will still require mypy. Misc: Code does not lint in my env, by some reason `pyre check` cannot find `mypy` library. Related to: * Instagram#451 * pyastrx/pyastrx#40 * python/mypy#12513 * python/mypy#4868
With the
inspect
/ast
modules I can get an AST of e.g. a function and inspect variables - but in terms of types, I can only get the annotations from the signature (AFAIK).Can mypy enrich the rest of the tree with types? And preferably expose this so that plugins can be written for it?
Does mypy already build a typed AST under the hood? Can you point me to where it happens?
Thank you
The text was updated successfully, but these errors were encountered: