Skip to content

bpo-45020: Add -X frozen_modules=[on|off] to explicitly control use of frozen modules. #28320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Sep 13, 2021

Currently we freeze several modules into the runtime. For each of these modules it is essential to bootstrapping the runtime that they be frozen. Any other stdlib module that we later freeze into the runtime is not essential. We can just as well import from the .py file.

This PR lets users explicitly choose which should be used, with the new "-X frozen_modules=[on|off]" CLI flag. The default is "on" for non-debug builds and "off" for debug builds. The latter is for the benefit of CPython contributors, so source changes they make are used immediately without having to run make regen-frozen. FWIW, we may try making the default logic a little more sophisticated for CPython contributors, but switching based on --with-pydebug should be enough at first.

(FYI, these changes come from gh-28107. @gvanrossum already reviewed them there.)

https://bugs.python.org/issue45020

@gvanrossum
Copy link
Member

So I still think non-debug, non-pgo builds should default to not frozen.

Ah, I think I misunderstood before. So are you saying the only time we should default to using frozen modules is if it's a PGO build? I was going with: the only time we don't default to frozen is if we're developing (running out of the repo). It sounds like you had something else in mind.

Yeah, since we no longer consider whether it's installed, there are three basic build modes that may be relevant:

  1. --with-pydebug
  2. default
  3. --enable-optimizations

Your assumption seems to be that we recommend developers use (1), and only release managers use (3), when actually building a release.

But my assumption (and my workflow) is that usually (2) works just fine for most purposes. And it's the default for a reason, right? In particular people who are only editing Python code (like Steven D'Aprano or Ethan Furman, both core devs who never touch C code) have no reason to use (1) or (3), so they likely use (2). They do a checkout, run ./configure and make once, and then just play around with their Python modules. If we choose the default for (non-essential) frozen modules to be off for modes (1) and (2), and on in mode (3), we don't disturb the workflow of those developers at all.

Developers working on the freezing code (or something that affects freezing, like marshal) can turn on -X frozen_modules to test their work regardless of whether they use mode (1) or (2), or they can work in mode (3) if they want to benchmark specific performance aspects.

And the changes to initconfig.c look unwieldy still.

Any specific recommendations? Keep in mind that the only real change I've made is adding _PyConfig_InitImportConfig() and using it. The other code I added is extensions of the existing code, for a bool field on PyConfig. I suppose I could make use_frozen_modules "int" instead of "bool", like several of the other (effectively) bool fields. However, I don't see the disadvantage to starting to use bool here. Regardless, I'd appreciate your perspective on this.

I have to admit I don't understand this code very well, I've never worked with it before, and I don't know what constraints it has. The thought of dropping specific bool support and just making this an int flag came to me too, so maybe that's indeed a useful simplification -- a lot of the code you added seems to be infrastructure for a bool type.

Possibly my hangup is that I don't know what the purpose of _PyConfig_InitImportConfig() is. Is it just to set config->use_frozen_modules? I tried to look at the call sites and I don't follow what they are doing either, so this seems to be a problem on my side. I'll try to study it more.

@ericsnowcurrently
Copy link
Member Author

Yeah, since we no longer consider whether it's installed, there are three basic build modes that may be relevant:

  1. --with-pydebug
  2. default
  3. --enable-optimizations

Your assumption seems to be that we recommend developers use (1), and only release managers use (3), when actually building a release.

But my assumption (and my workflow) is that usually (2) works just fine for most purposes. And it's the default for a reason, right? In particular people who are only editing Python code (like Steven D'Aprano or Ethan Furman, both core devs who never touch C code) have no reason to use (1) or (3), so they likely use (2). They do a checkout, run ./configure and make once, and then just play around with their Python modules. If we choose the default for (non-essential) frozen modules to be off for modes (1) and (2), and on in mode (3), we don't disturb the workflow of those developers at all.

Developers working on the freezing code (or something that affects freezing, like marshal) can turn on -X frozen_modules to test their work regardless of whether they use mode (1) or (2), or they can work in mode (3) if they want to benchmark specific performance aspects.

All that makes sense. I'll go ahead with the switch, though I'll probably still tackle the PGO check in a separate PR. The only downside is that folks using an installed python without --enable-optimizations won't get the benefit of the frozen modules (by default). Of course, the whole issue is irrelevant once I get the is-running-in-the-repo logic sorted out.

@ericsnowcurrently
Copy link
Member Author

I have to admit I don't understand this code very well, I've never worked with it before, and I don't know what constraints it has. The thought of dropping specific bool support and just making this an int flag came to me too, so maybe that's indeed a useful simplification -- a lot of the code you added seems to be infrastructure for a bool type.

I suppose it's a question of whether or not using bool is worth it. (Personally, I find it more readable.) If it is worth it then what better time to add the necessary helpers than now? The helpers are mostly copy-and-paste adaptations of the existing ones.

Possibly my hangup is that I don't know what the purpose of _PyConfig_InitImportConfig() is. Is it just to set config->use_frozen_modules?

_PyConfig_InitImportConfig() is for initializing the runtime config related to the import system. Up to this point that only involved setting up what would later become sys.path, which is essentially what _PyConfig_InitPathConfig() does. However, with this PR we also initialize PyConfig.use_frozen_modules here.

At first I added a function just to initialize use_frozen_modules but I found things got a lot cleaner if I pulled the code together into _PyConfig_InitImportConfig() instead, in part because use_frozen_modules was fairly tightly coupled to the result of _PyConfig_InitPathConfig() at the time. Currently they aren't so coupled, but I expect we'll get there once we sort out the running-in-the-repo logic.

I tried to look at the call sites and I don't follow what they are doing either, so this seems to be a problem on my side. I'll try to study it more.

Don't stress too much. The runtime initialization code is among the most complex in the repo, even after all the work Nick and Victor have done to clean it up.

@gvanrossum
Copy link
Member

Okay, I still vote to get rid of the bool infrastructure and just use ints. I've been writing C code using ints as flags most of my life. :-)

Regarding the other thing, I guess this bit of code at line 2200 in initconfig.c is most mystifying to me:

    if (config->_install_importlib) {
        if (compute_path_config) {
            status = _PyConfig_InitImportConfig(config);
        }
        else {
            status = _PyConfig_InitPathConfig(config, 0);
        }
        if (_PyStatus_EXCEPTION(status)) {
            return status;
        }
    }

It seems to say that if we're installing importlib but we're not computing the path configuration, don't initialize use_frozen_modules? It almost seems to be simpler to remove the call to _PyConfig_InitPathConfig() from _PyConfig_InitImportConfig(), so the latter is purely concerned with setting config->use_frozen_modules, and the above code block can be something like

status = _PyConfig_InitPathConfig(config, compute_path_config);
// Error check
status = _PyConfig_InitImportConfig(config);
// Error check

@ericsnowcurrently
Copy link
Member Author

Regarding the other thing, I guess this bit of code at line 2200 in initconfig.c is most mystifying to me: ...

Thanks for clarifying. Yeah, that's harder to following than it needs to be. That code is partly a holdover from earlier when I was depending on the results of _PyConfig_InitPathConfig(). I've fixed it.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting pretty close...

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits left only. After this let me look at it once more from a holistic POV, but I think it's good.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one unused function. Otherwise LG. Congrats!

@ericsnowcurrently
Copy link
Member Author

Thanks for all the helpful reviewing!

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anchors away!

@ericsnowcurrently ericsnowcurrently merged commit a65c868 into python:main Sep 14, 2021
ericsnowcurrently added a commit that referenced this pull request Sep 15, 2021
)

Doing this provides significant performance gains for runtime startup (~15% with all the imported modules frozen). We don't yet freeze all the imported modules because there are a few hiccups in the build systems we need to sort out first. (See bpo-45186 and bpo-45188.)

Note that in PR GH-28320 we added a command-line flag (-X frozen_modules=[on|off]) that allows users to opt out of (or into) using frozen modules. The default is still "off" but we will change it to "on" as soon as we can do it in a way that does not cause contributors pain.

https://bugs.python.org/issue45020
@ericsnowcurrently ericsnowcurrently deleted the frozen-modules-dev-default-off-basic branch September 15, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants