Skip to content

bpo-45324: Capture data in FrozenImporter.find_spec() to use in exec_module(). #28633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Sep 29, 2021

Currently we end up duplicating effort and throwing away data in FrozenImporter.find_spec(). With this change we do the work once in find_spec() and the only thing we do in FrozenImporter.exec_module() is turn the raw frozen data into a code object and then exec it.

The change involves adding _imp.find_frozen(), adding an arg to _imp.get_frozen_object(), and updating FrozenImporter. We also move some code around to reduce some duplication, get a little more consistency in outcomes, and be more efficient.

Note that this change is mostly necessary if we want to set __file__ on frozen stdlib modules. (See https://bugs.python.org/issue21736.)

https://bugs.python.org/issue45324

@ericsnowcurrently ericsnowcurrently force-pushed the frozen-modules-find-spec branch from f7c009e to c6e70cc Compare October 1, 2021 00:43
@ericsnowcurrently ericsnowcurrently added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 1, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ericsnowcurrently for commit c6e70cc 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 1, 2021
@ericsnowcurrently ericsnowcurrently added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 1, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ericsnowcurrently for commit c6e70cc 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 1, 2021
@ericsnowcurrently ericsnowcurrently force-pushed the frozen-modules-find-spec branch from 0c94188 to 8b8cf51 Compare October 1, 2021 18:21
@ericsnowcurrently ericsnowcurrently force-pushed the frozen-modules-find-spec branch from 8b8cf51 to d1b1868 Compare October 1, 2021 18:46
Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my questions are probably more, "add a comment" or "rename to be clearer". 😄

spec = spec_from_loader(fullname, cls,
origin=cls._ORIGIN,
is_package=ispkg)
spec.loader_state = (data,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the tuple? It isn't future-proofing anything as you will still have to change any other accesses unless you index into this (which you don't; see line 860). Otherwise I dictionary could also be used if you're trying to avoid issues w/ the future needing more than one piece of data.

But honestly, I would just keep it simple for now and directly assign to the attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

name = spec.name
try:
data, = spec.loader_state
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely broad for just reading an attribute. What are you specifically guarding against? If you're worried about the attribute not being set can't you just check for None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought I'd changed that to AttributeError. I'll fix it.

name=name)
data = None
else:
# We clear the extra data we got from the finder, to save memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the loader is one-time use? What happens if you call exec_module() again (e.g. importlib.reload())? Won't this break that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work fine. If the data is set then the _imp.get_frozen_object() call in exec_module() will unmarshal that. If the data isn't set (e.g. with a reload) then get_frozen_object() will look up the frozen data, in addition to unmarshaling it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added to that comment to clarify.

FROZEN_BAD_NAME, // The given module name wasn't valid.
FROZEN_NOT_FOUND, // It wasn't in PyImport_FrozenModules.
FROZEN_DISABLED, // -X frozen_modules=off (and not essential)
FROZEN_EXCLUDED, // The PyImport_FrozenModules entry has NULL "code".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the search should continue somewhere else? When is this used to block the import search somewhere else?

Copy link
Member Author

@ericsnowcurrently ericsnowcurrently Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the original motivation of that status is. It pre-dates this PR. I only consolidated the case into the enum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind investigating but it falls outside the scope of this PR and it would be hard to justify prioritizing it otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FROZEN_NOT_FOUND, // It wasn't in PyImport_FrozenModules.
FROZEN_DISABLED, // -X frozen_modules=off (and not essential)
FROZEN_EXCLUDED, // The PyImport_FrozenModules entry has NULL "code".
FROZEN_INVALID, // The PyImport_FrozenModules entry is bogus.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "invalid" mean? Do you mean "corrupt"? I'm just trying to think of a user who sees this and wonder how they might be able to resolve it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means someone manually put invalid data into PyImport_FrozenModules (or that Tools/scripts/freeze_modules.py is broken). We were already treating this case as FROZEN_NOT_FOUND. I only teased out a separate enum member to make the condition more clear.

I don't mind adding more notes in the comment but to me the best thing to do is to look at the places where that status is emitted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind adding more notes in the comment but to me the best thing to do is to look at the places where that status is emitted.

Do you mean in set_frozen_error()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the enum value rather than the exception. In the case of FROZEN_INVALID, that would be in find_frozen() where we return FROZEN_INVALID (or likewise unmarshal_frozen_code() or _imp.get_frozen_object()).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (name == NULL) {
PyErr_Clear();
return NULL;
return FROZEN_BAD_NAME;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does PyUnicode_AsUTF8() not set an exception? That seems like something that should propagated up instead of silenced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does set an exception, hence PyErr_Clear(). I added that in gh-28320, where PyUnicode_AsUTF8() replaced the existing usage of _PyUnicode_EqualToASCIIString(). The latter either asserts in error cases or ignores the errors, so I followed suit, effectively matching the existing behavior of the code. I don't mind propagating the error but didn't want to change behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a note.

}

if (!use_frozen() && !is_essential_frozen_module(name)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does use_frozen() represent? The use of && suggests that function represents use_essential_frozen_modules() would be a better name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_frozen() says whether or not frozen modules should be used. (See the -X frozen_modules CLI option.) We call is_essential_module() to make sure we still use the necessary frozen modules. If you are suggesting that I combine the two functions, note that I did so in that original PR, @zooba suggested I not, and I agreed. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is I thought -X frozen_modules is for built-in frozen modules, not all frozen modules, and so use_frozen() doesn't seemed labelled appropriately based on that understanding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for clarifying. What do you mean by "built-in"? I wasn't aware of different kinds of frozen modules, other than that a few of them must be used no matter what for importlib bootstrap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but what this also means is if I use frozen modules in my project and I for some reason use -X frozen_modules then you just broke me. That's probably fine, but I was personally viewing all of this on/off work as only affecting what was in the stdlib, not the entire frozen module mechanism (except for the "essential" modules which are technically just the import-related ones). That's why the names were throwing me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense. So should we restrict the flag to stdlib modules (-X frozen_stdlib)? If so then yeah, the function should be named "use_frozen_stdlib".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what's best, but at least we now understand why I kept getting confused. 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. We can deal with "-X frozen_stdlib" separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I'm going to change the -X option to "frozen_stdlib" (and the import.c function to "use_frozen_stdlib"). Thanks for pointing this out.

name);
return NULL;
if (info != NULL) {
info->nameobj = nameobj; // borrowed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info->nameobj = nameobj; // borrowed
info->nameobj = nameobj; // Borrowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't a sentence and I'd rather not make it one. I'd rather drop the comment then add noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment itself can't hurt but yeah, it doesn't need to be sentence-ized.

self.assertIsNotNone(spec.loader_state)

def check_search_location(self, spec, source=None):
# For now frozen packages do not have any path entries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# For now frozen packages do not have any path entries.
# Frozen packages do not have any path entries.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@ericsnowcurrently
Copy link
Member Author

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@brettcannon: please review the changes made to this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants