This is an experimental fork of protoc and the associated gRPC Python compiler grpc_tools.protoc.
The package is not intended for production but rather to provide a lightweight, self-contained environment from which to experiment with these packages.
The code generated by grpc_tools.protoc --python_out is dependent on protobuf, and the code generated by grpc_tools.protoc --grpc_python_out is dependent on grpcio. These packages are not strictly necessary for studying the code generator; it will generate code without any package available to run it. However, if they are not present in the local environment, it is recommended setup.py be allowed to install them, to enable testing the generated code's descriptor references for correctness.
The intention of this experimental fork is to investigate the possibility of parameterizing the module names of generated python code. The default code generator uses the relative paths of imported proto files to determine the imports to generate for python. For specific use cases these imports are well-formed, but are not flexible; moreover, there is ample evidence of a long-running incompatibility between standard python use cases and the imports generated by grpc_tools.protoc.
- protocolbuffers/protobuf#90
- protocolbuffers/protobuf#881
- protocolbuffers/protobuf#1491
- protocolbuffers/protobuf#2283
- protocolbuffers/protobuf#1511
- protocolbuffers/protobuf#3728
- protocolbuffers/protobuf#7470
- https://news.ycombinator.com/item?id=21873468
- https://stackoverflow.com/questions/63832738/grpc-tools-protoc-generates-python-files-with-broken-imports
- https://stackoverflow.com/questions/65135519/python3-grpc-compiler-how-to-handle-absolute-and-relative-imports-in-protos
- https://stackoverflow.com/questions/71869672/python-grpc-cannot-import-generated-module-from-proto
- https://stackoverflow.com/questions/73939276/grpc-tools-protoc-generates-broken-pb2-python-file
Other languages' code generators are well aware of the need for parameterization in this case. The C++ code generator mirrors proto file structure in the namespaces of generated code. Code generators for Go, Java, ObjectiveC, C#, and even PHP (!!??) explicitly recognize options to override the default namespace or class name, thus providing a means of generating code that can be predictably referenced.
Python use-cases on the other hand are hamstrung by the inability to do so, but also by the fact that top-level imports are rendered as import [module_name]. This means that in Python3, which deprecates implicit relative imports (a deprecation which was first announced in 2004), their validity as imports depends entirely on the PYTHONPATH containing [module_name].py, so that most use cases require housing the entrypoint python code in the same location. Many developers would prefer to be able to dump all grpc_tools.protoc outputs in a separate directory, not intermingled with the protos themselves, and certainly not intermingled with standard code and other package machinery.
There are a few possible approaches to address this with more flexibility:
- Explicit relative imports: this would entail the generated code importing other generated code explicitly by prefixing the top-level package name with the appropriate number of dots.
- Absolute imports: this would entail the understanding that all protos are to be dumped in a predetermined namespace package, and prefixing the module names with this top-level package to ensure that no imports can be interpreted as Python2-style implicit relative imports. The top-level package name would need to be in the PYTHONPATH of the environment running the generated code, which is a portable approach. A common package name for some set of generated code in a system encourages reuse of code. Several such packages could be referenced, allowing one to use proto package names (as defined by the
package [package_name]directive) to refer to the corresponding package of generated code on the PYTHONPATH without namespace pollution or collisions. - Proto options: this would entail registering a protobuf option within proto files which offers some means of mapping from protobuf namespace to Python3 modules. As mentioned above, many languages (almost all non-C++ languages supported by the C++-based protobuf package) have support for such an option. This solution could pose some problems for third-party proto files used in conjunction with local ones, but in most cases this is a contrived concern, as it is even simpler to implement a flag to override this specifically for the Python compiler.
protogenextension: this would entail building a Go extension that handles the conversion from IDL to Python source code. It may actually be the simplest way to address the problem quickly with a package that is tangential to the main protobuf and grpc distributions, provided that it could be rolled up as apippackage for easy setup. This approach would avoid justifying the logic of the extension in a PR (something which doesn't appear likely to succeed).
In general it seems most of the reasons given for not implementing such a feature come from concerns that would not affect the typical Python user. One must admit there hasn't been unanimous agreement on the best way to address the issue, but the issue looms large in ProtocolBuffers/gRPC Python nonetheless.
The approach taken in the dev_proto branch is experimental and goes a bit further. In practice, most users' proto files, if at all distinguished by location, happen to be distinguished by the --proto_path or -I flag provided at compile time. The command_line_interface.cc implementation uses a virtual file/disk file mapping to group together all protos in a dependency tree (implemented as SourceTree). Assuming one has avoided name conflicts, each proto file can be uniquely mapped to the include path it was found in; in practice these include paths will have some relation to the desired namespace or packaging of the protos. However the command line interface does not share this information with any CodeGenerator. The changes in dev_proto in command_line_interface.cc instead record the mapping from virtual proto filename to physical disk filename for the entire transitive dependency tree of all protos compiled, and package this information into a parameter string. This parameter string is set aside and only in the case of the --python_out option, passes this parameter string to the generator, alongside the parameters passed at the command line. The Python generator then parses them separately. At this stage nothing further is done, but the next developments on this branch will clean up the Python generator's module naming semantics so that they are more easily parameterized. At that point the parameters may be used to map from include path to top-level Python package name.
This fork groups a few distinct components from the gRPC and protobuf packages:
third_party/protobuf/src/google/protobuf/: This is the C++protocimplementation, as shipped by gRPC in the top-levelthird_party/protobuf/folder of the repo (it is copied in here as would be in a build for convenience). It implementsprotoc's core functionality including the protobuf parser and the importer (which resolves references found by the parser) to produce the IDL representation of a proto file. It also implements the interfaces for code generators and plugins, and the unified command line interface through which they are accessed. In particular, the python code generator specific sources are inthird_party/protobuf/src/google/protobuf/compiler/python/. Thesetup.pyfor the python package copiesprotoc's protos intogrpc_tools/_proto/google/protobuf/. These protos are used by theprotocimplementation to construct IDL representations of protobuf objects such as file, message, enum, type, and service descriptors, among others. This path should be in the proto path when usingprotoc(this is done automatically in the grpc_tools python wrapper, so is only a concern when using aprotocbinary directly).grpc_root/src/compiler/andgrpc_root/include/: These are the source files for all of the gRPC code generators; in particular, the python code generator. The generators implemented here are complementary to the proto generators implemented inthird_party. The gRPC generators are dependent on coreprotocdependencies to read proto files and inherit from theCodeGeneratorinterface; however they are independent of theprotoclanguage plugins. The includes consist of each of*.c,*.cc(Linux), and*.cpp(Windows) compatible header files.grpc_tools/: This is the core python interface exposingprotoc's python compiler and its gRPC python compiler as thepython -m grpc_tools.protoccommand-line tool. This tool depends on the coreprotoccommand line interface, as well as the pythonprotoclanguage plugin (exposed via the--python-outargument). However the dependency on the language plugin is superficial; thegrpc_python_generatordoes not otherwise interact with the plugin or share aGeneratorContextinstance.
python3 -m venv env
env/bin/pip install cython
GRPC_PYTHON_BUILD_WITH_CYTHON=1 env/bin/pip -vvv install -e .