Skip to content

Cannot access to customized paths within .pth file #79312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ValentinZhao mannequin opened this issue Nov 1, 2018 · 12 comments
Closed

Cannot access to customized paths within .pth file #79312

ValentinZhao mannequin opened this issue Nov 1, 2018 · 12 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life easy OS-windows type-bug An unexpected behavior, bug, or error

Comments

@ValentinZhao
Copy link
Mannequin

ValentinZhao mannequin commented Nov 1, 2018

BPO 35131
Nosy @brettcannon, @pfmoore, @jaraco, @vstinner, @tjguk, @zware, @zooba, @Windsooon
Files
  • [IMG_20181101_173328_[email protected]
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-11-01.09:56:37.722>
    labels = ['easy', '3.8', 'type-bug', '3.7', 'OS-windows']
    title = 'Cannot access to customized paths within .pth file'
    updated_at = <Date 2018-11-29.14:57:52.838>
    user = 'https://bugs.python.org/ValentinZhao'

    bugs.python.org fields:

    activity = <Date 2018-11-29.14:57:52.838>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Windows']
    creation = <Date 2018-11-01.09:56:37.722>
    creator = 'Valentin Zhao'
    dependencies = []
    files = ['47899']
    hgrepos = []
    issue_num = 35131
    keywords = ['easy']
    message_count = 11.0
    messages = ['329050', '329172', '329173', '329178', '329198', '329199', '329497', '329498', '330058', '330113', '330201']
    nosy_count = 9.0
    nosy_names = ['brett.cannon', 'paul.moore', 'jaraco', 'vstinner', 'tim.golden', 'zach.ware', 'steve.dower', 'Windson Yang', 'Valentin Zhao']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue35131'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @ValentinZhao
    Copy link
    Mannequin Author

    ValentinZhao mannequin commented Nov 1, 2018

    I want to manage all the packages that I installed so every time adding package I set "--target" so the package will be downloaded there. Then I wrote the directory in a .pth file which is located in "/Python36/Lib/site-packages" so I could still get accessed to all the packages even though they are not located within "Python36" folder.

    However, my current user name of Windows is a Chinese name, which means the customized path I mentioned before has Chinese characters within it, thus the .pth file will be also encoded with 'gbk'. Every time I would like to import these packages will get "UnicodeDecodeError: 'gbk' can't decode byte xxx...".

    Fortunately I have found the reason and cracked the problem: python read .pth files without setting any encoding. The code is located in "Python36/Lib/site.py"

    def addpackage(sitedir, name, known_paths):
        if known_paths is None:
            known_paths = _init_pathinfo()
            reset = True
        else:
            reset = False
        fullname = os.path.join(sitedir, name)
        try:
            # here should set the second param as encoding='utf-8'
            f = open(fullname, "r")
        except OSError:
            return
        # other codes

    And after I doing this, everything goes well.

    @ValentinZhao ValentinZhao mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 1, 2018
    @ned-deily ned-deily added OS-windows and removed stdlib Python modules in the Lib dir labels Nov 2, 2018
    @zooba
    Copy link
    Member

    zooba commented Nov 2, 2018

    Can you save your file in gbk encoding? That will be an immediate fix.

    I don't know that we can/should change the encoding we read without checking with everyone who writes out .pth files. (+Jason as a start here, but I suspect there are more tools that write them.)

    We could add a handler for UnicodeDecodeError that falls back on utf-8? I think that's reasonable.

    @zooba
    Copy link
    Member

    zooba commented Nov 2, 2018

    I'll mark this easy as well, since adding that handler is straightforward. Unless someone knows a reason we shouldn't do that either.

    @zooba zooba added easy 3.7 (EOL) end of life 3.8 (EOL) end of life labels Nov 2, 2018
    @Windsooon
    Copy link
    Mannequin

    Windsooon mannequin commented Nov 3, 2018

    Hello, Valentin Zhao, do you have time to fix it? Or I can create a PR

    @jaraco
    Copy link
    Member

    jaraco commented Nov 3, 2018

    I'm only aware of one tool that writes .pth files, and that's setuptools, and it always writes ASCII (assuming package names are ASCII), so any encoding handling should be fine there.

    We could add a handler for UnicodeDecodeError that falls back on utf-8?

    Yes, reasonable, but maybe we should consider instead _preferring_ UTF-8 and fall back to default encodings. That would be my preference.

    @jaraco
    Copy link
    Member

    jaraco commented Nov 3, 2018

    Also, I would argue that this is an enhancement request and not a bug - that the prior expectation was that the .pth file is encoded in whatever encoding the system expects by default, and that adding support for a standardized encoding for .pth files is a new feature.

    As another aside: Valentin, the technique you're using to manage packages is likely to run into issues with certain packages - in particular any packages that rely on their own .pth files to invoke behavior, such as future_fstrings (https://pypi.org/project/future-fstrings/). I learned about this issue in (jaraco/pip-run#29), which is why the rwt project adds a sitecustomize.py to the target directory that ensures .pth files are run. Just FYI.

    @ValentinZhao
    Copy link
    Mannequin Author

    ValentinZhao mannequin commented Nov 9, 2018

    I am better just waiting you guys fixing that because it is not urgent.
    On Sat, Nov 3, 2018 at 10:12 PM Jason R. Coombs <[email protected]>
    wrote:

    Jason R. Coombs [email protected] added the comment:

    Also, I would argue that this is an enhancement request and not a bug -
    that the prior expectation was that the .pth file is encoded in whatever
    encoding the system expects by default, and that adding support for a
    standardized encoding for .pth files is a new feature.

    As another aside: Valentin, the technique you're using to manage packages
    is likely to run into issues with certain packages - in particular any
    packages that rely on their own .pth files to invoke behavior, such as
    future_fstrings (https://pypi.org/project/future-fstrings/). I learned
    about this issue in (jaraco/pip-run#29), which is
    why the rwt project adds a sitecustomize.py to the target directory that
    ensures .pth files are run. Just FYI.

    ----------


    Python tracker <[email protected]>
    <https://bugs.python.org/issue35131\>


    @Windsooon
    Copy link
    Mannequin

    Windsooon mannequin commented Nov 9, 2018

    I tried to create a PR for it, However, I don't know how to handle the code at https://github.com/python/cpython/blob/d4c76d960b/Lib/site.py#L159

    So how to check UnicodeDecodeError when we just open the file, I use readlines() but it may use too many memory than before (I'm not sure it's important in this case).

    try:
        f = open(fullname, "r")
        data = f.readlines()
    except UnicodeDecodeError:
        f = open(fullname, "r", encoding="utf-8")
        data = f.readlines()
    

    @jaraco
    Copy link
    Member

    jaraco commented Nov 18, 2018

    The problem you've encountered is that previously the file was assumed to be one encoding and would fail if it was not that encoding... so it was possible to lazy-load the file and process each line.

    In the new model, where you need to evaluate the viability of the file in one of two candidate encodings, you'll necessarily need to read the entire file once before processing its contents.

    Therefore, I recommend one of these options:

    1. Always read the file in binary mode, ascertain the "best" encoding, then rewind the file and wrap it in a TextIOWrapper for that encoding. Presumably this logic is common--perhaps there's already a routine that does just that.
    2. In a try/except block, read the entire content, decoded, into another iterable ... and then have the logic below rely on that content. i.e. f = list(f).
    3. Always assume UTF-8 instead of the system encoding. This change would be backward incompatible, so probably isn't acceptable without at least an interim release with a deprecation warning.

    I recommend a combination of (1) and then (3) in the future. That is:

    def determine_best_encoding(f, encodings=('utf-8', sys.getdefaultencoding())):
        """
        Attempt to read and decode all of stream f using the encodings
        and return the first one that succeeds. Rewinds the file.
        """
    
    
    f = open(..., 'rb)
    encoding = determine_best_encoding(f)
    if encoding != 'utf-8':
        warnings.warn("Detected pth file with unsupported encoding", DeprecationWarning)
    f = io.TextIOWrapper(f, encoding)

    Then, in a future version, dropping support for local encodings, all of that code can be replaced with f = open(..., encoding='utf-8').

    @brettcannon
    Copy link
    Member

    There is not "find best encoding" code, hence why so much code out there uses chardet. :)

    This might also tie into issue bpo-33944 and the idea of rethinking .pth files.

    @Windsooon
    Copy link
    Mannequin

    Windsooon mannequin commented Nov 21, 2018

    I will fix this issue after we have consensus with the future of .pth file in bpo-33944

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Oct 8, 2024

    Resolved in 3.12.4 via #77102 (which added support for utf-8 encoded .pth files)

    @ncoghlan ncoghlan closed this as completed Oct 8, 2024
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life easy OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants