Cannot access to customized paths within .pth file #79312

ValentinZhao · 2018-11-01T09:56:38Z

BPO	35131
Nosy	@brettcannon, @pfmoore, @jaraco, @vstinner, @tjguk, @zware, @zooba, @Windsooon
Files	[IMG_20181101_173328_[email protected]

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2018-11-01.09:56:37.722>
labels = ['easy', '3.8', 'type-bug', '3.7', 'OS-windows']
title = 'Cannot access to customized paths within .pth file'
updated_at = <Date 2018-11-29.14:57:52.838>
user = 'https://bugs.python.org/ValentinZhao'

bugs.python.org fields:

activity = <Date 2018-11-29.14:57:52.838>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Windows']
creation = <Date 2018-11-01.09:56:37.722>
creator = 'Valentin Zhao'
dependencies = []
files = ['47899']
hgrepos = []
issue_num = 35131
keywords = ['easy']
message_count = 11.0
messages = ['329050', '329172', '329173', '329178', '329198', '329199', '329497', '329498', '330058', '330113', '330201']
nosy_count = 9.0
nosy_names = ['brett.cannon', 'paul.moore', 'jaraco', 'vstinner', 'tim.golden', 'zach.ware', 'steve.dower', 'Windson Yang', 'Valentin Zhao']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue35131'
versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

ValentinZhao · 2018-11-01T09:56:37Z

I want to manage all the packages that I installed so every time adding package I set "--target" so the package will be downloaded there. Then I wrote the directory in a .pth file which is located in "/Python36/Lib/site-packages" so I could still get accessed to all the packages even though they are not located within "Python36" folder.

However, my current user name of Windows is a Chinese name, which means the customized path I mentioned before has Chinese characters within it, thus the .pth file will be also encoded with 'gbk'. Every time I would like to import these packages will get "UnicodeDecodeError: 'gbk' can't decode byte xxx...".

Fortunately I have found the reason and cracked the problem: python read .pth files without setting any encoding. The code is located in "Python36/Lib/site.py"

def addpackage(sitedir, name, known_paths):
    if known_paths is None:
        known_paths = _init_pathinfo()
        reset = True
    else:
        reset = False
    fullname = os.path.join(sitedir, name)
    try:
        # here should set the second param as encoding='utf-8'
        f = open(fullname, "r")
    except OSError:
        return
    # other codes

And after I doing this, everything goes well.

zooba · 2018-11-02T23:40:16Z

Can you save your file in gbk encoding? That will be an immediate fix.

I don't know that we can/should change the encoding we read without checking with everyone who writes out .pth files. (+Jason as a start here, but I suspect there are more tools that write them.)

We could add a handler for UnicodeDecodeError that falls back on utf-8? I think that's reasonable.

zooba · 2018-11-02T23:40:59Z

I'll mark this easy as well, since adding that handler is straightforward. Unless someone knows a reason we shouldn't do that either.

Windsooon · 2018-11-03T03:47:10Z

Hello, Valentin Zhao, do you have time to fix it? Or I can create a PR

jaraco · 2018-11-03T14:05:00Z

I'm only aware of one tool that writes .pth files, and that's setuptools, and it always writes ASCII (assuming package names are ASCII), so any encoding handling should be fine there.

We could add a handler for UnicodeDecodeError that falls back on utf-8?

Yes, reasonable, but maybe we should consider instead _preferring_ UTF-8 and fall back to default encodings. That would be my preference.

jaraco · 2018-11-03T14:12:34Z

Also, I would argue that this is an enhancement request and not a bug - that the prior expectation was that the .pth file is encoded in whatever encoding the system expects by default, and that adding support for a standardized encoding for .pth files is a new feature.

As another aside: Valentin, the technique you're using to manage packages is likely to run into issues with certain packages - in particular any packages that rely on their own .pth files to invoke behavior, such as future_fstrings (https://pypi.org/project/future-fstrings/). I learned about this issue in (jaraco/pip-run#29), which is why the rwt project adds a sitecustomize.py to the target directory that ensures .pth files are run. Just FYI.

ValentinZhao · 2018-11-09T06:42:13Z

I am better just waiting you guys fixing that because it is not urgent.
On Sat, Nov 3, 2018 at 10:12 PM Jason R. Coombs <[email protected]>
wrote:

Jason R. Coombs [email protected] added the comment:

Also, I would argue that this is an enhancement request and not a bug -
that the prior expectation was that the .pth file is encoded in whatever
encoding the system expects by default, and that adding support for a
standardized encoding for .pth files is a new feature.

As another aside: Valentin, the technique you're using to manage packages
is likely to run into issues with certain packages - in particular any
packages that rely on their own .pth files to invoke behavior, such as
future_fstrings (https://pypi.org/project/future-fstrings/). I learned
about this issue in (jaraco/pip-run#29), which is
why the rwt project adds a sitecustomize.py to the target directory that
ensures .pth files are run. Just FYI.

----------

Python tracker <[email protected]>
<https://bugs.python.org/issue35131\>

Windsooon · 2018-11-09T06:58:59Z

I tried to create a PR for it, However, I don't know how to handle the code at https://github.com/python/cpython/blob/d4c76d960b/Lib/site.py#L159

So how to check UnicodeDecodeError when we just open the file, I use readlines() but it may use too many memory than before (I'm not sure it's important in this case).

try:
    f = open(fullname, "r")
    data = f.readlines()
except UnicodeDecodeError:
    f = open(fullname, "r", encoding="utf-8")
    data = f.readlines()

jaraco · 2018-11-18T18:42:30Z

The problem you've encountered is that previously the file was assumed to be one encoding and would fail if it was not that encoding... so it was possible to lazy-load the file and process each line.

In the new model, where you need to evaluate the viability of the file in one of two candidate encodings, you'll necessarily need to read the entire file once before processing its contents.

Therefore, I recommend one of these options:

Always read the file in binary mode, ascertain the "best" encoding, then rewind the file and wrap it in a TextIOWrapper for that encoding. Presumably this logic is common--perhaps there's already a routine that does just that.
In a try/except block, read the entire content, decoded, into another iterable ... and then have the logic below rely on that content. i.e. f = list(f).
Always assume UTF-8 instead of the system encoding. This change would be backward incompatible, so probably isn't acceptable without at least an interim release with a deprecation warning.

I recommend a combination of (1) and then (3) in the future. That is:

def determine_best_encoding(f, encodings=('utf-8', sys.getdefaultencoding())):
    """
    Attempt to read and decode all of stream f using the encodings
    and return the first one that succeeds. Rewinds the file.
    """


f = open(..., 'rb)
encoding = determine_best_encoding(f)
if encoding != 'utf-8':
    warnings.warn("Detected pth file with unsupported encoding", DeprecationWarning)
f = io.TextIOWrapper(f, encoding)

Then, in a future version, dropping support for local encodings, all of that code can be replaced with f = open(..., encoding='utf-8').

brettcannon · 2018-11-19T19:52:53Z

There is not "find best encoding" code, hence why so much code out there uses chardet. :)

This might also tie into issue bpo-33944 and the idea of rethinking .pth files.

Windsooon · 2018-11-21T13:23:17Z

I will fix this issue after we have consensus with the future of .pth file in bpo-33944

ncoghlan · 2024-10-08T11:07:55Z

Resolved in 3.12.4 via #77102 (which added support for utf-8 encoded .pth files)

ValentinZhao mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 1, 2018

ned-deily added OS-windows and removed stdlib Python modules in the Lib dir labels Nov 2, 2018

zooba added easy 3.7 (EOL) end of life 3.8 (EOL) end of life labels Nov 2, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

ncoghlan closed this as completed Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot access to customized paths within .pth file #79312

Cannot access to customized paths within .pth file #79312

ValentinZhao mannequin commented Nov 1, 2018

ValentinZhao mannequin commented Nov 1, 2018

zooba commented Nov 2, 2018

zooba commented Nov 2, 2018

Windsooon mannequin commented Nov 3, 2018

jaraco commented Nov 3, 2018

jaraco commented Nov 3, 2018

ValentinZhao mannequin commented Nov 9, 2018

Windsooon mannequin commented Nov 9, 2018

jaraco commented Nov 18, 2018

brettcannon commented Nov 19, 2018

Windsooon mannequin commented Nov 21, 2018

ncoghlan commented Oct 8, 2024

Cannot access to customized paths within .pth file #79312

Cannot access to customized paths within .pth file #79312

Comments

ValentinZhao mannequin commented Nov 1, 2018

ValentinZhao mannequin commented Nov 1, 2018

zooba commented Nov 2, 2018

zooba commented Nov 2, 2018

Windsooon mannequin commented Nov 3, 2018

jaraco commented Nov 3, 2018

jaraco commented Nov 3, 2018

ValentinZhao mannequin commented Nov 9, 2018

Windsooon mannequin commented Nov 9, 2018

jaraco commented Nov 18, 2018

brettcannon commented Nov 19, 2018

Windsooon mannequin commented Nov 21, 2018

ncoghlan commented Oct 8, 2024