-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[mlir python] Dis-aggregating MLIRPythonExtension.Core #56037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-mlir-python |
Yes! I started a patch a few weeks ago to break this apart and got it to the point that I could use it (but needed more work to land). Let me try to remember what computer i stashed that branch on :) |
(by the numbers it only helped iree a little bit, but iree is already using most of mlir. I expect folks who are more just using a couple of dialects would benefit immensely) |
Awesome, thanks @stellaraccident! I'm happy to help here if I can. In CIRCT, I don't think our Python libraries are going to need anything more than the builtin dialect, so we should be able to show the benefit for projects on the other end of the spectrum. |
Here's the patch I started: stellaraccident@93daca1 This was enough to have IREE select only the parts it wanted so I could assess. I expect it would also let you see that in circt. You'll see that a key thing that needs to be done is to fix the commented out IRCore.cpp bit for mlirRegisterAllDialects. I had some thoughts about this -- none of them written down. Here is the idea:
# Register upstream dialects if available.
try:
from . import _mlirRegisterAllDialects
def _register_all_dialects_hook(context):
_mlirRegisterAllDialects.register_all_dialects(context)
except ModuleNotFoundError:
pass
else:
_mlir.globals.add_context_creation_hook(_register_all_dialects_hook)
# Defer to a project-level initializer for downstreams to customize.
try:
from . import _site_initializer
except ModuleNotFoundError:
pass The idea is that Python level glue would inject the context initialization logic. We would have logic to do this automatically if the installation was built with the _mlirRegisterAllDialects extension (like upstream is), or downstreams can provide their own There's some subtlety here and I am either happy to review if someone wants to take a stab, or I can find some time to implement this myself. |
That would make any new context automatically register all dialects, why would we do this? |
Well, isn't that what happens here today?
I am also concerned about the modularity, but I was worried about backwards compatibility in the event projects have come to depend on this. My understanding of Stella's sketch is projects that actually want all MLIR dialects loaded can use the new, separated _mlirRegisterAllDialects.so, but projects who don't could just not build/depend on that, and not use this global context hook at all if they don't want. I think we've actually asked for a similar mechanism before for CIRCT, and met the same concerns, so I'm not super attached to that. I just want a safe path to not compiling most of MLIR for the Python extensions, and as a secondary concern, not registering most of MLIR into my Python contexts. |
Indeed, and that a mistake IMO. This was done temporarily apparently (https://reviews.llvm.org/D88155 ) but as usual this is the kind of temporary that last (which is why I'm against this kind of shortcut in general).
That means that this global context hook can't be added to the context creation method, I'm not sure how it is invoked then? |
I was also thinking of adding an option to the context constructor to use the global registration hook. Leaving that aside, as a downstream: I want the ability to implicitly configure a context to my requirements and not leave that to user code. As an example: But I think this should be overridable. Perhaps |
(I am actually completely fine making the "site specific default" for upstream mlir require explicit registration, but I would still like the mechanism, and i would like to use it to let this disaggregation path keep behavior the same -- so that we can switch the user impacting behavior separately if desired) |
Sure, that makes sense, but I don't quite see how this justifies a MLIR-supplied global registration hook that would applies to all projects? |
Because that is a native class that downstreams get verbatim from the upstream API: there is no mechanism for customizing what it does, unless if we build a python-ey hook in -- which is what I propose here. To be clear, the only reason such a hook is coupled to this patch is because I feel strongly that I would like to fix the build layering separate from making a user impacting breaking change. I am also perfectly open to making this change but there is a fair bit more work to define dialect level registration hooks for upstream dialects to do it completely, and we need flexibility on the chicken/egg. I would have trouble doing it in one step, even if we wanted to. So in my mind, the customization hook is a two birds/one stone thing: downstreams want/benefit from it, and upstream can use it as an intermediate step to a more explicit API and build layering. |
Thinking about this some more, I think I can extend the mechanism get you where you want to go, Mehdi, but there is a lot of downstream debt in this area, and I'm not going to change the user behavior in one step. I'd be aiming for a more clear way to provide dialect registration entry points and a customization mechanism to enable good default behavior. I've got an idea on my mind but it probably needs a patch, not just a sketch to make it clear. |
In Python, I'd like to strike a good balance between avoiding the negative effects of global registration and having something usable out-of-the-box. Most Python packages are usable by This may not be very pythonic, but can't we just inherit contexts? Have a |
Yeah, we can do that if we want (will be a little gross since it will need to hack something into a module agree it is defined, but meh -- it would just be in an init file. I think I do need to spend some time and see about exposing dialect registration python side Ina better way: right now, we have no build-layered compatible way to register individual upstream dialects. I'm trying to avoid having an extension per dialect, but maybe that becomes necessary :( |
I took a stab at a more complete implementation. PTAL: https://reviews.llvm.org/D128593 |
I think I got the above patch to a final state, and IREE (which is what I tested with) no longer links in everything. PTAL. |
Note that this does implement the global hooks for registration and initialization that Mehdi didn't like, but it also separates the _BaseContext from the Context (which subclasses _BaseContext). The hooks only get applied to the Context. I work on a lot of downstreams, and I strongly believe that getting the default Context creation right for a distribution (which can now enforce whatever defaults it wants) is the right thing (note that because of the way namespacing works, things like IREE and CIRCT already have their own: iree.compiler.ir.Context and circt.ir.Context). Separating the classes like this should allow other policies/usage to be done in the future if there is ever a case and preserves API layering. I'm willing to debate names (i.e. _BaseContext vs Context, AllDialectsContext, etc), but I believe that what I have chosen satisfies the principle of least surprise for users, and it centralizes some of the hard-to-get-right site specific initialization. |
Thanks @stellaraccident! I will take a look and try out the patch tomorrow. |
…tion/loading activities. Since the very first commits, the Python and C MLIR APIs have had mis-placed registration/load functionality for dialects, extensions, etc. This was done pragmatically in order to get bootstrapped and then just grew in. Downstreams largely bypass and do their own thing by providing various APIs to register things they need. Meanwhile, the C++ APIs have stabilized around this and it would make sense to follow suit. The thing we have observed in canonical usage by downstreams is that each downstream tends to have native entry points that configure its installation to its preferences with one-stop APIs. This patch leans in to this approach with `RegisterEverything.h` and `mlir._mlir_libs._mlirRegisterEverything` being the one-stop entry points for the "upstream packages". The `_mlir_libs.__init__.py` now allows customization of the environment and Context by adding "initialization modules" to the `_mlir_libs` package. If present, `_mlirRegisterEverything` is treated as such a module. Others can be added by downstreams by adding a `_site_initialize_{i}.py` module, where '{i}' is a number starting with zero. The number will be incremented and corresponding module loaded until one is not found. Initialization modules can: * Perform load time customization to the global environment (i.e. registering passes, hooks, etc). * Define a `register_dialects(registry: DialectRegistry)` function that can extend the `DialectRegistry` that will be used to bootstrap the `Context`. * Define a `context_init_hook(context: Context)` function that will be added to a list of callbacks which will be invoked after dialect registration during `Context` initialization. Note that the `MLIRPythonExtension.RegisterEverything` is not included by default when building a downstream (its corresponding behavior was prior). For downstreams which need the default MLIR initialization to take place, they must add this back in to their Python CMake build just like they add their own components (i.e. to `add_mlir_python_common_capi_library` and `add_mlir_python_modules`). It is perfectly valid to not do this, in which case, only the things explicitly depended on and initialized by downstreams will be built/packaged. If the downstream has not been set up for this, it is recommended to simply add this back for the time being and pay the build time/package size cost. CMake changes: * `MLIRCAPIRegistration` -> `MLIRCAPIRegisterEverything` (renamed to signify what it does and force an evaluation: a number of places were incidentally linking this very expensive target) * `MLIRPythonSoure.Passes` removed (without replacement: just drop) * `MLIRPythonExtension.AllPassesRegistration` removed (without replacement: just drop) * `MLIRPythonExtension.Conversions` removed (without replacement: just drop) * `MLIRPythonExtension.Transforms` removed (without replacement: just drop) Header changes: * `mlir-c/Registration.h` is deleted. Dialect registration functionality is now in `IR.h`. Registration of upstream features are in `mlir-c/RegisterEverything.h`. When updating MLIR and a couple of downstreams, I found that proper usage was commingled so required making a choice vs just blind S&R. Python APIs removed: * mlir.transforms and mlir.conversions (previously only had an __init__.py which indirectly triggered `mlirRegisterTransformsPasses()` and `mlirRegisterConversionPasses()` respectively). Downstream impact: Remove these imports if present (they now happen as part of default initialization). * mlir._mlir_libs._all_passes_registration, mlir._mlir_libs._mlirTransforms, mlir._mlir_libs._mlirConversions. Downstream impact: None expected (these were internally used). C-APIs changed: * mlirRegisterAllDialects(MlirContext) now takes an MlirDialectRegistry instead. It also used to trigger loading of all dialects, which was already marked with a TODO to remove -- it no longer does, and for direct use, dialects must be explicitly loaded. Downstream impact: Direct C-API users must ensure that needed dialects are loaded or call `mlirContextLoadAllAvailableDialects(MlirContext)` to emulate the prior behavior. Also see the `ir.c` test case (e.g. ` mlirContextGetOrLoadDialect(ctx, mlirStringRefCreateFromCString("func"));`). * mlirDialectHandle* APIs were moved from Registration.h (which now is restricted to just global/upstream registration) to IR.h, arguably where it should have been. Downstream impact: include correct header (likely already doing so). C-APIs added: * mlirContextLoadAllAvailableDialects(MlirContext): Corresponds to C++ API with the same purpose. Python APIs added: * mlir.ir.DialectRegistry: Mapping for an MlirDialectRegistry. * mlir.ir.Context.append_dialect_registry(MlirDialectRegistry) * mlir.ir.Context.load_all_available_dialects() * mlir._mlir_libs._mlirAllRegistration: New native extension that exposes a `register_dialects(MlirDialectRegistry)` entry point and performs all upstream pass/conversion/transforms registration on init. In this first step, we eagerly load this as part of the __init__.py and use it to monkey patch the Context to emulate prior behavior. * Type caster and capsule support for MlirDialectRegistry This should make it possible to build downstream Python dialects that only depend on a subset of MLIR. See: #56037 Here is an example PR, minimally adapting IREE to these changes: https://github.com/iree-org/iree/pull/9638/files In this situation, IREE is opting to not link everything, since it is already configuring the Context to its liking. For projects that would just like to not think about it and pull in everything, add `MLIRPythonExtension.RegisterEverything` to the list of Python sources getting built, and the old behavior will continue. Reviewed By: mehdi_amini, ftynse Differential Revision: https://reviews.llvm.org/D128593
Should we consider this fixed by 5e83a5b ? |
I think so. There is always more cleanup that can be done, but I think the issue is fixed. |
…tion/loading activities. Since the very first commits, the Python and C MLIR APIs have had mis-placed registration/load functionality for dialects, extensions, etc. This was done pragmatically in order to get bootstrapped and then just grew in. Downstreams largely bypass and do their own thing by providing various APIs to register things they need. Meanwhile, the C++ APIs have stabilized around this and it would make sense to follow suit. The thing we have observed in canonical usage by downstreams is that each downstream tends to have native entry points that configure its installation to its preferences with one-stop APIs. This patch leans in to this approach with `RegisterEverything.h` and `mlir._mlir_libs._mlirRegisterEverything` being the one-stop entry points for the "upstream packages". The `_mlir_libs.__init__.py` now allows customization of the environment and Context by adding "initialization modules" to the `_mlir_libs` package. If present, `_mlirRegisterEverything` is treated as such a module. Others can be added by downstreams by adding a `_site_initialize_{i}.py` module, where '{i}' is a number starting with zero. The number will be incremented and corresponding module loaded until one is not found. Initialization modules can: * Perform load time customization to the global environment (i.e. registering passes, hooks, etc). * Define a `register_dialects(registry: DialectRegistry)` function that can extend the `DialectRegistry` that will be used to bootstrap the `Context`. * Define a `context_init_hook(context: Context)` function that will be added to a list of callbacks which will be invoked after dialect registration during `Context` initialization. Note that the `MLIRPythonExtension.RegisterEverything` is not included by default when building a downstream (its corresponding behavior was prior). For downstreams which need the default MLIR initialization to take place, they must add this back in to their Python CMake build just like they add their own components (i.e. to `add_mlir_python_common_capi_library` and `add_mlir_python_modules`). It is perfectly valid to not do this, in which case, only the things explicitly depended on and initialized by downstreams will be built/packaged. If the downstream has not been set up for this, it is recommended to simply add this back for the time being and pay the build time/package size cost. CMake changes: * `MLIRCAPIRegistration` -> `MLIRCAPIRegisterEverything` (renamed to signify what it does and force an evaluation: a number of places were incidentally linking this very expensive target) * `MLIRPythonSoure.Passes` removed (without replacement: just drop) * `MLIRPythonExtension.AllPassesRegistration` removed (without replacement: just drop) * `MLIRPythonExtension.Conversions` removed (without replacement: just drop) * `MLIRPythonExtension.Transforms` removed (without replacement: just drop) Header changes: * `mlir-c/Registration.h` is deleted. Dialect registration functionality is now in `IR.h`. Registration of upstream features are in `mlir-c/RegisterEverything.h`. When updating MLIR and a couple of downstreams, I found that proper usage was commingled so required making a choice vs just blind S&R. Python APIs removed: * mlir.transforms and mlir.conversions (previously only had an __init__.py which indirectly triggered `mlirRegisterTransformsPasses()` and `mlirRegisterConversionPasses()` respectively). Downstream impact: Remove these imports if present (they now happen as part of default initialization). * mlir._mlir_libs._all_passes_registration, mlir._mlir_libs._mlirTransforms, mlir._mlir_libs._mlirConversions. Downstream impact: None expected (these were internally used). C-APIs changed: * mlirRegisterAllDialects(MlirContext) now takes an MlirDialectRegistry instead. It also used to trigger loading of all dialects, which was already marked with a TODO to remove -- it no longer does, and for direct use, dialects must be explicitly loaded. Downstream impact: Direct C-API users must ensure that needed dialects are loaded or call `mlirContextLoadAllAvailableDialects(MlirContext)` to emulate the prior behavior. Also see the `ir.c` test case (e.g. ` mlirContextGetOrLoadDialect(ctx, mlirStringRefCreateFromCString("func"));`). * mlirDialectHandle* APIs were moved from Registration.h (which now is restricted to just global/upstream registration) to IR.h, arguably where it should have been. Downstream impact: include correct header (likely already doing so). C-APIs added: * mlirContextLoadAllAvailableDialects(MlirContext): Corresponds to C++ API with the same purpose. Python APIs added: * mlir.ir.DialectRegistry: Mapping for an MlirDialectRegistry. * mlir.ir.Context.append_dialect_registry(MlirDialectRegistry) * mlir.ir.Context.load_all_available_dialects() * mlir._mlir_libs._mlirAllRegistration: New native extension that exposes a `register_dialects(MlirDialectRegistry)` entry point and performs all upstream pass/conversion/transforms registration on init. In this first step, we eagerly load this as part of the __init__.py and use it to monkey patch the Context to emulate prior behavior. * Type caster and capsule support for MlirDialectRegistry This should make it possible to build downstream Python dialects that only depend on a subset of MLIR. See: llvm#56037 Here is an example PR, minimally adapting IREE to these changes: https://github.com/iree-org/iree/pull/9638/files In this situation, IREE is opting to not link everything, since it is already configuring the Context to its liking. For projects that would just like to not think about it and pull in everything, add `MLIRPythonExtension.RegisterEverything` to the list of Python sources getting built, and the old behavior will continue. Reviewed By: mehdi_amini, ftynse Differential Revision: https://reviews.llvm.org/D128593
…tion/loading activities. Since the very first commits, the Python and C MLIR APIs have had mis-placed registration/load functionality for dialects, extensions, etc. This was done pragmatically in order to get bootstrapped and then just grew in. Downstreams largely bypass and do their own thing by providing various APIs to register things they need. Meanwhile, the C++ APIs have stabilized around this and it would make sense to follow suit. The thing we have observed in canonical usage by downstreams is that each downstream tends to have native entry points that configure its installation to its preferences with one-stop APIs. This patch leans in to this approach with `RegisterEverything.h` and `mlir._mlir_libs._mlirRegisterEverything` being the one-stop entry points for the "upstream packages". The `_mlir_libs.__init__.py` now allows customization of the environment and Context by adding "initialization modules" to the `_mlir_libs` package. If present, `_mlirRegisterEverything` is treated as such a module. Others can be added by downstreams by adding a `_site_initialize_{i}.py` module, where '{i}' is a number starting with zero. The number will be incremented and corresponding module loaded until one is not found. Initialization modules can: * Perform load time customization to the global environment (i.e. registering passes, hooks, etc). * Define a `register_dialects(registry: DialectRegistry)` function that can extend the `DialectRegistry` that will be used to bootstrap the `Context`. * Define a `context_init_hook(context: Context)` function that will be added to a list of callbacks which will be invoked after dialect registration during `Context` initialization. Note that the `MLIRPythonExtension.RegisterEverything` is not included by default when building a downstream (its corresponding behavior was prior). For downstreams which need the default MLIR initialization to take place, they must add this back in to their Python CMake build just like they add their own components (i.e. to `add_mlir_python_common_capi_library` and `add_mlir_python_modules`). It is perfectly valid to not do this, in which case, only the things explicitly depended on and initialized by downstreams will be built/packaged. If the downstream has not been set up for this, it is recommended to simply add this back for the time being and pay the build time/package size cost. CMake changes: * `MLIRCAPIRegistration` -> `MLIRCAPIRegisterEverything` (renamed to signify what it does and force an evaluation: a number of places were incidentally linking this very expensive target) * `MLIRPythonSoure.Passes` removed (without replacement: just drop) * `MLIRPythonExtension.AllPassesRegistration` removed (without replacement: just drop) * `MLIRPythonExtension.Conversions` removed (without replacement: just drop) * `MLIRPythonExtension.Transforms` removed (without replacement: just drop) Header changes: * `mlir-c/Registration.h` is deleted. Dialect registration functionality is now in `IR.h`. Registration of upstream features are in `mlir-c/RegisterEverything.h`. When updating MLIR and a couple of downstreams, I found that proper usage was commingled so required making a choice vs just blind S&R. Python APIs removed: * mlir.transforms and mlir.conversions (previously only had an __init__.py which indirectly triggered `mlirRegisterTransformsPasses()` and `mlirRegisterConversionPasses()` respectively). Downstream impact: Remove these imports if present (they now happen as part of default initialization). * mlir._mlir_libs._all_passes_registration, mlir._mlir_libs._mlirTransforms, mlir._mlir_libs._mlirConversions. Downstream impact: None expected (these were internally used). C-APIs changed: * mlirRegisterAllDialects(MlirContext) now takes an MlirDialectRegistry instead. It also used to trigger loading of all dialects, which was already marked with a TODO to remove -- it no longer does, and for direct use, dialects must be explicitly loaded. Downstream impact: Direct C-API users must ensure that needed dialects are loaded or call `mlirContextLoadAllAvailableDialects(MlirContext)` to emulate the prior behavior. Also see the `ir.c` test case (e.g. ` mlirContextGetOrLoadDialect(ctx, mlirStringRefCreateFromCString("func"));`). * mlirDialectHandle* APIs were moved from Registration.h (which now is restricted to just global/upstream registration) to IR.h, arguably where it should have been. Downstream impact: include correct header (likely already doing so). C-APIs added: * mlirContextLoadAllAvailableDialects(MlirContext): Corresponds to C++ API with the same purpose. Python APIs added: * mlir.ir.DialectRegistry: Mapping for an MlirDialectRegistry. * mlir.ir.Context.append_dialect_registry(MlirDialectRegistry) * mlir.ir.Context.load_all_available_dialects() * mlir._mlir_libs._mlirAllRegistration: New native extension that exposes a `register_dialects(MlirDialectRegistry)` entry point and performs all upstream pass/conversion/transforms registration on init. In this first step, we eagerly load this as part of the __init__.py and use it to monkey patch the Context to emulate prior behavior. * Type caster and capsule support for MlirDialectRegistry This should make it possible to build downstream Python dialects that only depend on a subset of MLIR. See: llvm/llvm-project#56037 Here is an example PR, minimally adapting IREE to these changes: https://github.com/iree-org/iree/pull/9638/files In this situation, IREE is opting to not link everything, since it is already configuring the Context to its liking. For projects that would just like to not think about it and pull in everything, add `MLIRPythonExtension.RegisterEverything` to the list of Python sources getting built, and the old behavior will continue. Reviewed By: mehdi_amini, ftynse Differential Revision: https://reviews.llvm.org/D128593
Since the work last year to refactor how MLIR Python extensions are linked, we have had this comment about dis-aggregating MLIRPythonExtension.Core:
llvm-project/mlir/python/CMakeLists.txt
Line 279 in 6ccc273
My understanding is this dependency on MLIRCAPIRegistration is pulling in all the MLIR dialects, translations, and conversions:
llvm-project/mlir/lib/CAPI/Registration/CMakeLists.txt
Lines 1 to 14 in 10affe7
Some downstream projects may not want all of that, and may just want to depend on the "core" MLIR Python extension. This is definitely the case in the CIRCT project.
I wanted to open an issue before diving in to see if anyone has thoughts about how to go about this.
cc @stellaraccident
The text was updated successfully, but these errors were encountered: