Skip to content

Force the builtin module key to be the correct type. #2814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 24, 2021

Conversation

crimsoncor
Copy link
Contributor

@crimsoncor crimsoncor commented Jan 22, 2021

Previously it was always going to be a std::string which converted into
unicode. Python 2 appears to want module keys to be normal str types, so
this was breaking code that expected plain string types in the
builtins.keys() data structure

Description

Suggested changelog entry:

* The module key in builtins that pybind11 uses to store its internals changed from std::string to a python str type (more natural on Python 2, no change on Python 3)

Previously it was always going to be a std::string which converted into
unicode. Python 2 appears to want module keys to be normal str types, so
this was breaking code that expected plain string types in the
builtins.keys() data structure
@YannickJadoul
Copy link
Collaborator

We discussed this on Gitter. I agree that it makes sense that a module's __dict__'s keys would be the Python version's str (so py::bytes for Python 2) rather than always py::str (so unicode in Python 2).

Fun thing, this is actually not breaking the ABI, and (probably) not even stopping different pybind11 libraries from interacting, since:

Python 2.7.17 (default, Sep 30 2020, 13:38:04) 
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> d = {}
>>> d[u'answer'] = 42
>>> d[u'answer']
42
>>> d['answer']
42
>>> type(u'answer')
<type 'unicode'>
>>> type('answer')
<type 'str'>
>>> 

Do you mind checking this, @crimsoncor, whether a pybind11 library from our current master manages to interact with another library that's built with this PR (on Python 2) ? If so, I don't really see a reason why this couldn't go into pybind11 2.7. (Potentially even 2.6.2, but that's maybe too close to release to still add this.)

@YannickJadoul YannickJadoul added this to the v2.7 milestone Jan 22, 2021
@crimsoncor
Copy link
Contributor Author

Did some very basic testing of this (source code in the attached zip). I built one module with a pre-patch pybind11 and the other with post-patch (and then also flipped the order). In all cases, both modules were able to access the same internals block. What I did notice was that which one was loaded first determined the type of the argument in the builtin dict:

/crypt/code/other/pythonTest> python
Python 2.7.18 (default, Apr 23 2020, 09:27:04) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
>>> import bar
>>> print( {type(k) for k in __builtins__.__dict__.keys()})
set([<type 'unicode'>, <type 'str'>])
>>> 
/crypt/code/other/pythonTest> python
Python 2.7.18 (default, Apr 23 2020, 09:27:04) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bar
>>> import foo
>>> print( {type(k) for k in __builtins__.__dict__.keys()})
set([<type 'str'>])

But even with that said, regardless of the type of the key, it still worked for both modules. Also double checked to make sure there was ever only a single entry in the map (despite it being str vs unicode)

/crypt/code/other/pythonTest> python
Python 2.7.18 (default, Apr 23 2020, 09:27:04) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
>>> import bar
>>> for k in __builtins__.__dict__.keys():
...   if "pybind" in k:
...     print(k)
... 
__pybind11_internals_v4_gcc_libstdcpp_cxxabi1011__
>>> 
/crypt/code/other/pythonTest> python
Python 2.7.18 (default, Apr 23 2020, 09:27:04) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bar
>>> import foo
>>> for k in __builtins__.__dict__.keys():
...   if "pybind" in k:
...     print(k)
... 
__pybind11_internals_v4_gcc_libstdcpp_cxxabi1011__

And that looks like it also was correct

pythonTest.zip

@YannickJadoul
Copy link
Collaborator

@henryiii, I'm going to say it's your call on whether we've already stopped accepting PRs for 2.6.2 :-P

@henryiii
Copy link
Collaborator

If we can get another global testing run (which I think we will, @rwgk ?), maybe over the weekend, and release Monday, perhaps?

@rwgk
Copy link
Collaborator

rwgk commented Jan 22, 2021

If we can get another global testing run

OK, np, I'll include this PR in the next run.

@rwgk
Copy link
Collaborator

rwgk commented Jan 22, 2021

@crimsoncor if you could add a minimal test that would be great.

@crimsoncor
Copy link
Contributor Author

@crimsoncor if you could add a minimal test that would be great.

What sort of test do you think would be applicable? The existing tests that exercise internals being shared across multiple modules continue to function fine, which shows that changing the type of the key in python 2.7 has not caused any regression.

The only other test that seems like it might be applicable would be testing a module built with an older version and a module built with the new code (like I did manually) and I'm not sure how to add that in any sane fashion in the pybind11 test suite.

@YannickJadoul
Copy link
Collaborator

@crimsoncor, what about just adding a single test that checks {type(k) for k in __builtins__.__dict__.keys()} == {str} ?

@crimsoncor
Copy link
Contributor Author

done

@rwgk
Copy link
Collaborator

rwgk commented Jan 22, 2021

Awesome, thanks!
Minor issue: no newline at the end of the modified test file (I'm guessing some CI component will fail because of it).

@henryiii henryiii modified the milestones: v2.7, v2.6.2 Jan 22, 2021
if hasattr(__builtins__, "keys"):
keys = __builtins__.keys()
else: # this is to make pypy happy since builtins is different there.
keys = __builtins__.__dict__.keys()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply always go through __builtins__.__dict__.keys()? It works in CPython and PyPy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered in gitter, but I'll add here as well. Python 2 has the following amazing feature

    In module __main__:
        __builtins__ is a reference to module __builtin__.
        __builtin__ only exists if you import it.
    In any other module:
        __builtins__ is a reference to module __builtin__'s __dict__.
        __builtin__ only exists if you import it.

so we need to handle both cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this through our global testing and found no issues.

@rwgk rwgk merged commit 9ea39dc into pybind:master Jan 24, 2021
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jan 24, 2021
@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Jan 25, 2021

Thanks for creating this PR, @crimsoncor / @jesse-sony!

@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Jan 25, 2021
@jesse-sony jesse-sony deleted the feature/builtinsKeyType branch April 26, 2021 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants