-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Cannot access to customized paths within .pth file #79312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I want to manage all the packages that I installed so every time adding package I set "--target" so the package will be downloaded there. Then I wrote the directory in a .pth file which is located in "/Python36/Lib/site-packages" so I could still get accessed to all the packages even though they are not located within "Python36" folder. However, my current user name of Windows is a Chinese name, which means the customized path I mentioned before has Chinese characters within it, thus the .pth file will be also encoded with 'gbk'. Every time I would like to import these packages will get "UnicodeDecodeError: 'gbk' can't decode byte xxx...". Fortunately I have found the reason and cracked the problem: python read .pth files without setting any encoding. The code is located in "Python36/Lib/site.py" def addpackage(sitedir, name, known_paths):
if known_paths is None:
known_paths = _init_pathinfo()
reset = True
else:
reset = False
fullname = os.path.join(sitedir, name)
try:
# here should set the second param as encoding='utf-8'
f = open(fullname, "r")
except OSError:
return
# other codes And after I doing this, everything goes well. |
Can you save your file in gbk encoding? That will be an immediate fix. I don't know that we can/should change the encoding we read without checking with everyone who writes out .pth files. (+Jason as a start here, but I suspect there are more tools that write them.) We could add a handler for UnicodeDecodeError that falls back on utf-8? I think that's reasonable. |
I'll mark this easy as well, since adding that handler is straightforward. Unless someone knows a reason we shouldn't do that either. |
Hello, Valentin Zhao, do you have time to fix it? Or I can create a PR |
I'm only aware of one tool that writes .pth files, and that's setuptools, and it always writes ASCII (assuming package names are ASCII), so any encoding handling should be fine there.
Yes, reasonable, but maybe we should consider instead _preferring_ UTF-8 and fall back to default encodings. That would be my preference. |
Also, I would argue that this is an enhancement request and not a bug - that the prior expectation was that the .pth file is encoded in whatever encoding the system expects by default, and that adding support for a standardized encoding for .pth files is a new feature. As another aside: Valentin, the technique you're using to manage packages is likely to run into issues with certain packages - in particular any packages that rely on their own |
I am better just waiting you guys fixing that because it is not urgent.
|
I tried to create a PR for it, However, I don't know how to handle the code at https://github.com/python/cpython/blob/d4c76d960b/Lib/site.py#L159 So how to check UnicodeDecodeError when we just open the file, I use readlines() but it may use too many memory than before (I'm not sure it's important in this case).
|
The problem you've encountered is that previously the file was assumed to be one encoding and would fail if it was not that encoding... so it was possible to lazy-load the file and process each line. In the new model, where you need to evaluate the viability of the file in one of two candidate encodings, you'll necessarily need to read the entire file once before processing its contents. Therefore, I recommend one of these options:
I recommend a combination of (1) and then (3) in the future. That is: def determine_best_encoding(f, encodings=('utf-8', sys.getdefaultencoding())):
"""
Attempt to read and decode all of stream f using the encodings
and return the first one that succeeds. Rewinds the file.
"""
f = open(..., 'rb)
encoding = determine_best_encoding(f)
if encoding != 'utf-8':
warnings.warn("Detected pth file with unsupported encoding", DeprecationWarning)
f = io.TextIOWrapper(f, encoding) Then, in a future version, dropping support for local encodings, all of that code can be replaced with |
There is not "find best encoding" code, hence why so much code out there uses chardet. :) This might also tie into issue bpo-33944 and the idea of rethinking .pth files. |
I will fix this issue after we have consensus with the future of .pth file in bpo-33944 |
Resolved in 3.12.4 via #77102 (which added support for utf-8 encoded .pth files) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: