Skip to content

Regression in sys.path construction behaviour of Py_InitializeFromConfig() #92913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
freakboy3742 opened this issue May 18, 2022 · 9 comments · Fixed by #92980
Closed

Regression in sys.path construction behaviour of Py_InitializeFromConfig() #92913

freakboy3742 opened this issue May 18, 2022 · 9 comments · Fixed by #92980
Labels
type-bug An unexpected behavior, bug, or error

Comments

@freakboy3742
Copy link
Contributor

Bug report

3.11.0b1 appears to have a regression in the behavior of Py_InitializeFromConfig() in the way that sys.path is computed.

Reproduction case 1:

This code attempts to set the path through the config.pythonpath_env variable.

int main(int argc, char *argv[]) {
    PyConfig config;
    PyConfig_InitIsolatedConfig(&config);

    PyConfig_SetString(&config, &config.pythonpath_env, L"/path/to/app:/other/path");

    Py_InitializeFromConfig(&config);   
    PyRun_SimpleString("import sys; print(f'{sys.path=}')");
    return 0
}

In Python 3.8.13, 3.9.12 and 3.10.4, this code shows 6 items when sys.path is printed:

  • /path/to/app
  • /other/path
  • /path/to/Python/lib/python3X.zip
  • /path/to/Python/lib/python3.X
  • /path/to/Python/lib/python3.X/lib-dynload
  • /path/to/Python/lib/python3.X/site-packages

In Python 3.11.0b1, the first two items are not present.

Reproduction case 2:

This example is a closer match to the "more complete example" in the docs; it reads a full configuration, then appending the extra search paths.

int main(int argc, char *argv[]) {
    PyConfig config;
    PyConfig_InitIsolatedConfig(&config);

    PyConfig_Read(&config);
    PyWideStringList_Append(&config.module_search_paths, L"/path/to/app");
    PyWideStringList_Append(&config.module_search_paths, L"/other/path");

    Py_InitializeFromConfig(&config);   
    PyRun_SimpleString("import sys; print(f'{sys.path=}')");
    return 0
}

Again, in Python 3.8.13, 3.9.12 and 3.10.4, this code shows 6 items when sys.path is printed (although the order is different):

  • /path/to/Python/lib/python3X.zip
  • /path/to/Python/lib/python3.X
  • /path/to/Python/lib/python3.X/lib-dynload
  • /path/to/app
  • /other/path
  • /path/to/Python/lib/python3.X/site-packages

In Python 3.11.0b1, this raises an error, dumping the path configuration with _Py_DumpPathConfig(). This appears to be because it's modifying the path enough to make the encodings module unimportable.

If the 2 path appends are omitted (but the PyConfig_Read() call is retained), it runs, and outputs sys.path - but with the 2 custom module search paths missing.

Analysis

As best as I can make out, the regression was introduced by 99fcf15 (cc @zooba) as part of a refactor of getpath().

Your environment

Observed in Python 3.11.0b1 on macOS 12.3.1 on M1 hardware; however, I don't think there's anything Mac specific about it.

FWIW, I discovered this while porting the support library used by Briefcase to publish macOS and iOS apps.

@freakboy3742 freakboy3742 added the type-bug An unexpected behavior, bug, or error label May 18, 2022
@zooba
Copy link
Member

zooba commented May 18, 2022

I need to look closer at the second case, but in the first case, you've created an isolated configuration which (by default) ignores environment variables. You're then injecting the value for an environment variable, but haven't changed the flag that says to ignore them. So I think that's actually a bugfix.

Second case may be legit, but I haven't dug into it yet.

@zooba
Copy link
Member

zooba commented May 18, 2022

Oh right, the second case is the actual breaking change that had to happen, which is that PyConfig_Read can't actually read the full list of paths from the environment. module_search_paths are calculated on initialise now, so likely what's happening is that by appending to the empty list, you're overriding the rest of the search and not getting any of the essential paths.

This one clearly has a compatibility impact. I don't think it's unreasonable for things to change here, as embedders are inherently using version-specific APIs, but I'm not actually sure there's an alternative for "mostly use the defaults, but also include these paths". Only really "I know where the files are, here are the paths, don't put anything else in there".

Best proposal I've got is that Initialize could insert its paths before any that have been added by the caller, provided they aren't already in there (for other reasons, the init call has to be idempotent). Probably needs another flag to disable that and treat a preconfigured search path as, well, preconfigured.

@freakboy3742
Copy link
Contributor Author

The explanation of the first case makes sense; I'll admit I was a little hazy on whether setting pythonpath_env counted as "reading the environment", so it's good to know this was an error of usage.

As for the second case - the example code I've provided is essentially identical to the "more complete example" given in the docs - based on that, as a user I'd read it as an intended/supported use case, which means this is either (a) it's a regression; (b) a new API is needed to accomplish the same thing; or (c) it's a backwards incompatible change. Either way, the documentation needs to be updated to indicate what is/isn't possible.

@zooba
Copy link
Member

zooba commented May 18, 2022

I'll admit I was a little hazy on whether setting pythonpath_env counted as "reading the environment", so it's good to know this was an error of usage.

It was never really defined, but the implementation definitely changed from "ignore environment means do not read" to "ignore environment means do not use". Virtually none of this has been clearly defined, unfortunately. At least now it's "defined" by an easy to read Python script (well, compared to the absolute spaghetti that was the previous implementation).

Either way, the documentation needs to be updated to indicate what is/isn't possible.

For sure. I don't know how many places we need to put it, but at least one point should mention that you can't rely on reading values back out of the config unless you put them in there or you've already called Initialize on it.

We should also clone that example into a test somewhere to ensure we don't break it in the future.

I suspect, skimming through the getpath.py I linked above, that we may be able to just fix this by checking for module_search_paths_set instead of an empty pythonpath/module_search_paths list, and if it says it's not set but there are paths, assume they come after the calculated paths. Callers can then set that field to prevent any further calculations, which should be the same as in previous releases, and the only change is that you can now no longer insert new paths ahead of the standard ones before initialization. @vstinner does this sound like an acceptable change? (It definitely seems better than the current state)

@vstinner
Copy link
Member

I added module_search_paths_set to make sure that it's possible to fully control module_search_paths with PyConfig: completely ignore getpath path calculation. It's documented at: https://docs.python.org/dev/c-api/init_config.html#python-path-configuration

I didn't follow the getpath.c rewrite into Python getpath.py. If I read correctly this issue, the second example (PyWideStringList_Append(&config.module_search_paths, ...)) given in the doc is no longer working and so the doc should be updated: https://docs.python.org/dev/c-api/init_config.html#c.Py_InitializeFromConfig

If it's an incompatible change made on purpose, it should be documented at: https://docs.python.org/dev/whatsnew/3.11.html

@zooba
Copy link
Member

zooba commented May 19, 2022

Yes, we knew all of that. I'll take that as a sign that you're no longer concerned with this area and I can just decide it on my own.

zooba added a commit to zooba/cpython that referenced this issue May 19, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 19, 2022
…set] fields (pythonGH-92980)

(cherry picked from commit 403d16f)

Co-authored-by: Steve Dower <[email protected]>
@vstinner
Copy link
Member

Yes, we knew all of that. I'll take that as a sign that you're no longer concerned with this area and I can just decide it on my own.

I really appreciate that you did the final part of the old PEP 432 (10 years old) by rewriting getpath.c in Python. I hated the C code, it was unsafe, hard to maintain and I didn't like the fact that the whole file had a different implement just for Windows. It's great that the same file can now be used on all platforms!

When I completed PEP 587 PyConfig implementation, I planned to do that. I wrote a whole implementation. But Nick Coghlan requires me to write a whole new PEP just to rewrite getpath.c to Python. I just gave up, I was exhausted by the time spent on that topic! But I pushed the changes allowing to modify PyConfig after Python initialization: the Python API using dict, but also some subtle changes in getpath.c and PyConfig implementation to be able to override the "Path Configuration"... In secret, I had hope that you might want to do the work, and you did it ;-)

I wish I could cover all areas of Python that I like, but for my health and mental health, I prefer to restrict myself to a limited numbers of topics. So again, thanks for finally doing that, it's really a great achivement! My favorite part is the test suite, I really love it ;-)

@vstinner
Copy link
Member

@freakboy3742: Thanks for testing beta1, thanks to that, we can fix it and @zooba just fixed it ;-)

miss-islington added a commit that referenced this issue May 19, 2022
…ields (GH-92980)

(cherry picked from commit 403d16f)

Co-authored-by: Steve Dower <[email protected]>
@freakboy3742
Copy link
Contributor Author

@vstinner @zooba Thanks for the fast response, folks! I noticed a couple of minor typos in the new docs - but I can confirm the updated approach works as expected in my test case.

zooba added a commit to zooba/cpython that referenced this issue May 23, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 23, 2022
(cherry picked from commit 6a6f823)

Co-authored-by: Steve Dower <[email protected]>
miss-islington added a commit that referenced this issue May 23, 2022
(cherry picked from commit 6a6f823)

Co-authored-by: Steve Dower <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants