-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-45324: Capture data in FrozenImporter.find_spec() to use in exec_module(). #28633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-45324: Capture data in FrozenImporter.find_spec() to use in exec_module(). #28633
Conversation
f7c009e
to
c6e70cc
Compare
🤖 New build scheduled with the buildbot fleet by @ericsnowcurrently for commit c6e70cc 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
🤖 New build scheduled with the buildbot fleet by @ericsnowcurrently for commit c6e70cc 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
0c94188
to
8b8cf51
Compare
8b8cf51
to
d1b1868
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of my questions are probably more, "add a comment" or "rename to be clearer". 😄
Lib/importlib/_bootstrap.py
Outdated
spec = spec_from_loader(fullname, cls, | ||
origin=cls._ORIGIN, | ||
is_package=ispkg) | ||
spec.loader_state = (data,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the tuple? It isn't future-proofing anything as you will still have to change any other accesses unless you index into this (which you don't; see line 860). Otherwise I dictionary could also be used if you're trying to avoid issues w/ the future needing more than one piece of data.
But honestly, I would just keep it simple for now and directly assign to the attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Lib/importlib/_bootstrap.py
Outdated
name = spec.name | ||
try: | ||
data, = spec.loader_state | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is extremely broad for just reading an attribute. What are you specifically guarding against? If you're worried about the attribute not being set can't you just check for None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I thought I'd changed that to AttributeError. I'll fix it.
name=name) | ||
data = None | ||
else: | ||
# We clear the extra data we got from the finder, to save memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the loader is one-time use? What happens if you call exec_module() again (e.g. importlib.reload()
)? Won't this break that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work fine. If the data is set then the _imp.get_frozen_object()
call in exec_module()
will unmarshal that. If the data isn't set (e.g. with a reload) then get_frozen_object()
will look up the frozen data, in addition to unmarshaling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added to that comment to clarify.
FROZEN_BAD_NAME, // The given module name wasn't valid. | ||
FROZEN_NOT_FOUND, // It wasn't in PyImport_FrozenModules. | ||
FROZEN_DISABLED, // -X frozen_modules=off (and not essential) | ||
FROZEN_EXCLUDED, // The PyImport_FrozenModules entry has NULL "code". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the search should continue somewhere else? When is this used to block the import search somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the original motivation of that status is. It pre-dates this PR. I only consolidated the case into the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind investigating but it falls outside the scope of this PR and it would be hard to justify prioritizing it otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FROZEN_NOT_FOUND, // It wasn't in PyImport_FrozenModules. | ||
FROZEN_DISABLED, // -X frozen_modules=off (and not essential) | ||
FROZEN_EXCLUDED, // The PyImport_FrozenModules entry has NULL "code". | ||
FROZEN_INVALID, // The PyImport_FrozenModules entry is bogus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "invalid" mean? Do you mean "corrupt"? I'm just trying to think of a user who sees this and wonder how they might be able to resolve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means someone manually put invalid data into PyImport_FrozenModules
(or that Tools/scripts/freeze_modules.py is broken). We were already treating this case as FROZEN_NOT_FOUND
. I only teased out a separate enum member to make the condition more clear.
I don't mind adding more notes in the comment but to me the best thing to do is to look at the places where that status is emitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind adding more notes in the comment but to me the best thing to do is to look at the places where that status is emitted.
Do you mean in set_frozen_error()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the enum value rather than the exception. In the case of FROZEN_INVALID
, that would be in find_frozen()
where we return FROZEN_INVALID
(or likewise unmarshal_frozen_code()
or _imp.get_frozen_object()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (name == NULL) { | ||
PyErr_Clear(); | ||
return NULL; | ||
return FROZEN_BAD_NAME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does PyUnicode_AsUTF8()
not set an exception? That seems like something that should propagated up instead of silenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does set an exception, hence PyErr_Clear()
. I added that in gh-28320, where PyUnicode_AsUTF8()
replaced the existing usage of _PyUnicode_EqualToASCIIString()
. The latter either asserts in error cases or ignores the errors, so I followed suit, effectively matching the existing behavior of the code. I don't mind propagating the error but didn't want to change behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a note.
} | ||
|
||
if (!use_frozen() && !is_essential_frozen_module(name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does use_frozen()
represent? The use of &&
suggests that function represents use_essential_frozen_modules()
would be a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_frozen()
says whether or not frozen modules should be used. (See the -X frozen_modules
CLI option.) We call is_essential_module()
to make sure we still use the necessary frozen modules. If you are suggesting that I combine the two functions, note that I did so in that original PR, @zooba suggested I not, and I agreed. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is I thought -X frozen_modules
is for built-in frozen modules, not all frozen modules, and so use_frozen()
doesn't seemed labelled appropriately based on that understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for clarifying. What do you mean by "built-in"? I wasn't aware of different kinds of frozen modules, other than that a few of them must be used no matter what for importlib bootstrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but what this also means is if I use frozen modules in my project and I for some reason use -X frozen_modules
then you just broke me. That's probably fine, but I was personally viewing all of this on/off work as only affecting what was in the stdlib, not the entire frozen module mechanism (except for the "essential" modules which are technically just the import-related ones). That's why the names were throwing me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. So should we restrict the flag to stdlib modules (-X frozen_stdlib
)? If so then yeah, the function should be named "use_frozen_stdlib".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what's best, but at least we now understand why I kept getting confused. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. We can deal with "-X frozen_stdlib" separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I'm going to change the -X option to "frozen_stdlib" (and the import.c function to "use_frozen_stdlib"). Thanks for pointing this out.
name); | ||
return NULL; | ||
if (info != NULL) { | ||
info->nameobj = nameobj; // borrowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info->nameobj = nameobj; // borrowed | |
info->nameobj = nameobj; // Borrowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't a sentence and I'd rather not make it one. I'd rather drop the comment then add noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment itself can't hurt but yeah, it doesn't need to be sentence-ized.
self.assertIsNotNone(spec.loader_state) | ||
|
||
def check_search_location(self, spec, source=None): | ||
# For now frozen packages do not have any path entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# For now frozen packages do not have any path entries. | |
# Frozen packages do not have any path entries. |
When you're done making the requested changes, leave the comment: |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @brettcannon: please review the changes made to this pull request. |
Currently we end up duplicating effort and throwing away data in
FrozenImporter.find_spec()
. With this change we do the work once infind_spec()
and the only thing we do inFrozenImporter.exec_module()
is turn the raw frozen data into a code object and then exec it.The change involves adding
_imp.find_frozen()
, adding an arg to_imp.get_frozen_object()
, and updatingFrozenImporter
. We also move some code around to reduce some duplication, get a little more consistency in outcomes, and be more efficient.Note that this change is mostly necessary if we want to set
__file__
on frozen stdlib modules. (See https://bugs.python.org/issue21736.)https://bugs.python.org/issue45324