Skip to content

shutil.which wrong result on Windows #68693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bobjalex mannequin opened this issue Jun 24, 2015 · 21 comments
Closed

shutil.which wrong result on Windows #68693

bobjalex mannequin opened this issue Jun 24, 2015 · 21 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@bobjalex
Copy link
Mannequin

bobjalex mannequin commented Jun 24, 2015

BPO 24505
Nosy @pfmoore, @tjguk, @bitdancer, @zware, @SpecLad, @eryksun, @zooba, @bharel
Files
  • which2.py: A sample of 'which' function corrected
  • issue 24505 proposed patch windows cmd semantics.patch: Proposed patch following cmd.exe semantics
  • issue-24505-proposed-patch-2.patch
  • new_which_bug_files.zip: ZIP file with suggested file updates
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-06-24.22:09:39.162>
    labels = ['type-bug', '3.8', '3.9', '3.10', 'library', 'OS-windows']
    title = 'shutil.which wrong result on Windows'
    updated_at = <Date 2021-03-20.14:30:08.693>
    user = 'https://bugs.python.org/bobjalex'

    bugs.python.org fields:

    activity = <Date 2021-03-20.14:30:08.693>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2015-06-24.22:09:39.162>
    creator = 'bobjalex'
    dependencies = []
    files = ['39805', '41338', '41459', '42365']
    hgrepos = []
    issue_num = 24505
    keywords = ['patch']
    message_count = 20.0
    messages = ['245780', '245792', '245885', '245887', '252329', '252341', '252355', '252358', '252359', '252367', '256242', '256373', '256613', '257248', '258413', '258415', '262854', '262875', '348640', '389137']
    nosy_count = 10.0
    nosy_names = ['paul.moore', 'tim.golden', 'r.david.murray', 'zach.ware', 'SpecLad', 'eryksun', 'steve.dower', 'bobjalex', 'bar.harel', 'tobytobkin']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24505'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @bobjalex
    Copy link
    Mannequin Author

    bobjalex mannequin commented Jun 24, 2015

    Python session with 3.5b2

    Showing existing error:
    >>> from shutil import which
    
    Works OK
    >>> which("python")
    'C:\\Python27\\python.EXE'
    
    Also works OK
    >>> which('C:\\Python27\\python.EXE')
    'C:\\Python27\\python.EXE'
    
    Fails
    >>> which('C:\\Python27\\python')
    >>>
    
    Showing better results with my fix (attached):
    >>> from which2 import which
    >>> which("python")
    'C:\\Python27\\python.exe'
    >>> which('C:\\Python27\\python.exe')
    'C:\\Python27\\python.exe'
    >>> which('C:\\Python27\\python')
    'C:\\Python27\\python.exe'

    Problem is with the short-circuiting code near the beginning of the function. It fails to check the extensions for Windows when short-circuited.

    My fix has a few other improvements -- efficiency and nicer output without those ugly upper-case extensions.

    @bobjalex bobjalex mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jun 24, 2015
    @bitdancer
    Copy link
    Member

    Could you please submit your proposal as a patch? (A diff -u against the existing version).

    @bobjalex
    Copy link
    Mannequin Author

    bobjalex mannequin commented Jun 27, 2015

    Hi R. David --

    My report is just to notify y'all of a bug that I noticed. The bug is
    causing me no problem, and it's your option as to whether to fix it or not.

    I offered a fix, but I haven't the time to perform diffs, etc. You could
    make that diff yourself, since I sent my modified "which" function. Just
    replace the existing "which" function in the version of shutil you would
    like to fix, and perform the diff.

    Bob

    On Wed, Jun 24, 2015 at 8:26 PM, R. David Murray <[email protected]>
    wrote:

    R. David Murray added the comment:

    Could you please submit your proposal as a patch? (A diff -u against the
    existing version).

    ----------
    nosy: +r.david.murray


    Python tracker <[email protected]>
    <http://bugs.python.org/issue24505\>


    @bitdancer
    Copy link
    Member

    That's fine. Perhaps someone will be interested enough to pick this up and work on it.

    @bitdancer bitdancer added the easy label Jun 27, 2015
    @bharel
    Copy link
    Mannequin

    bharel mannequin commented Oct 5, 2015

    Unfortunately that patch will not help for files like "python.exe.exe", and double ext which sometimes happens.
    I'm on it. Should take me a day or two, more problems might arise along the way. Thanks Bob for alerting.

    Regarding the uppercase, I believe it is better to leave the casing as it is. It should be up to the user to mess with it as converting case is a one way ticket. If someone entered an uppercase, perhaps he wants it like that. Also it's less prune to errors and behaves the same for Unix and Windows.

    @eryksun
    Copy link
    Contributor

    eryksun commented Oct 5, 2015

    Please don't default to searching the current directory:

        if is_windows:
            # The current directory takes precedence on Windows.
            if not os.curdir in path:
                path.insert(0, os.curdir)

    Add a keyword option to enable this, with a warning that including the current directory isn't secure. This option should not be able to override the environment variable NoDefaultCurrentDirectoryInExePath.

    @bitdancer
    Copy link
    Member

    Well, that's a newish situation for Windows, something I certainly wasn't aware of. Which is trying to find the executable that would be found if typed at the prompt. So, what is the correct algorithm for emulating how Windows does that search? (Or some way to hook directly in to Windows search...)

    @zooba
    Copy link
    Member

    zooba commented Oct 5, 2015

    The Win32 API function linked by eryksun is the correct way to determine whether to include the current directory in the search - the doc specifically refers to matching cmd.exe's behaviour (searching ".;%PATH%" vs. just "%PATH%").

    The PATHEXT behaviour should be to look for the name as provided first, followed be appending each extension. So looking for "python" will look for ["python", *("python" + p for p in getenv('PATHEXT').split(';'))] in that order.

    @eryksun
    Copy link
    Contributor

    eryksun commented Oct 5, 2015

    My suggestion is only for 3.5 / 3.6, because Windows XP is no longer supported. Starting with Windows Vista, both cmd.exe and CreateProcess support this behavior. I realize that disabling searching the current director by default is going beyond what cmd.exe does, which is to require opting in via the environment variable. But Python can choose to follow the more-modern behavior of PowerShell as a precedent here.

    As Steve mentioned, technically a program isn't supposed to check for the environment variable, but instead call NeedCurrentDirectoryForExePath. This would require either adding a built-in function, e.g. to the _winapi module, or require ctypes:

        >>> kernel32.NeedCurrentDirectoryForExePathW('command')
        1
        >>> os.environ['NoDefaultCurrentDirectoryInExePath'] = '1'
        >>> kernel32.NeedCurrentDirectoryForExePathW('command')
        0

    Note that it requires first normalizing the path to replace slashes with backslashes, which is unusual for the Windows API.

        >>> kernel32.NeedCurrentDirectoryForExePathW(r'.\command')
        1
        >>> kernel32.NeedCurrentDirectoryForExePathW('./command')
        0

    @bharel
    Copy link
    Mannequin

    bharel mannequin commented Oct 5, 2015

    Thanks eryksun.

    Regarding the slashes, it should be fine as the input should be with
    backslashes anyway. We don't need to fix the input as fixing it would
    result in inaccurate data. Just as it won't work without backslashes on
    CMD, it shouldn't work here.

    On Mon, Oct 5, 2015 at 11:43 PM eryksun <[email protected]> wrote:

    eryksun added the comment:

    My suggestion is only for 3.5 / 3.6, because Windows XP is no longer
    supported. Starting with Windows Vista, both cmd.exe and CreateProcess
    support this behavior. I realize that disabling searching the current
    director by default is going beyond what cmd.exe does, which is to require
    opting in via the environment variable. But Python can choose to follow the
    more-modern behavior of PowerShell as a precedent here.

    As Steve mentioned, technically a program isn't supposed to check for the
    environment variable, but instead call NeedCurrentDirectoryForExePath. This
    would require either adding a built-in function, e.g. to the _winapi
    module, or require ctypes:

    \>\>\> kernel32.NeedCurrentDirectoryForExePathW('command')
    1
    \>\>\> os.environ['NoDefaultCurrentDirectoryInExePath'] = '1'
    \>\>\> kernel32.NeedCurrentDirectoryForExePathW('command')
    0
    

    Note that it requires first normalizing the path to replace slashes with
    backslashes, which is unusual for the Windows API.

    \>\>\> kernel32.NeedCurrentDirectoryForExePathW(r'.\\command')
    1
    \>\>\> kernel32.NeedCurrentDirectoryForExePathW('./command')
    0
    

    ----------


    Python tracker <[email protected]>
    <http://bugs.python.org/issue24505\>


    @tobytobkin
    Copy link
    Mannequin

    tobytobkin mannequin commented Dec 11, 2015

    I'm working on a patch and an accompanying unit test for this now. Should have it out today.

    @tobytobkin
    Copy link
    Mannequin

    tobytobkin mannequin commented Dec 14, 2015

    Hopefully this isn't too much of an amateur question, but I ran into a semantics issue: should which be imitating the semantics of the Windows shell or of CreateProcess[3]? The current implementation of shutil.which implies Windows shell, but Python uses CreateProcess in subprocess for executing commands, not cmd.exe.

    In order to correctly emulate the behavior of the Windows command search sequence[1] (e.g. for cmd.exe or Powershell), one needs to check whether or not a given command matches one of the internal Windows shell commands. For instance, if we have an executable at C:\xyz\chdir.exe, the following outputs should be observed from which if our current directory is C:\xyz:

    >>> which('chdir')
    (none)
    
    >>> which('.\\chdir')
    'C:\\xyz\\chdir.exe'

    On the other hand, CreateProcess[3] would work this way:

    CreateProcess(NULL, "chdir", ...) --> executes C:\xyz\chdir.exe
    CreateProcess(NULL, "chdir.exe", ...) --> executes C:\xyz\chdir.exe

    There are other semantic differences as well. For example, CreateProcess will not do path extension of e.g. "abc" to "abc.bat", but Powershell and cmd will.

    Which semantics do I follow? Powershell/cmd or CreateProcess?

    [1] Subsection "Command Search Sequence" of https://technet.microsoft.com/en-us/library/cc723564.aspx#XSLTsection127121120120
    [2] Subsection "Internal and External Commands" of https://technet.microsoft.com/en-us/library/cc723564.aspx#XSLTsection127121120120
    [3] https://msdn.microsoft.com/en-us/library/windows/desktop/ms682425(v=vs.85).aspx

    @tobytobkin
    Copy link
    Mannequin

    tobytobkin mannequin commented Dec 17, 2015

    Patch and tests attached. All Python tests passed on Windows 8 and Ubuntu 14.04LTS.

    @tobytobkin
    Copy link
    Mannequin

    tobytobkin mannequin commented Dec 31, 2015

    Patch incorporating Eryk Sun's feedback for various Windows-specific behavior attached.

    @zooba
    Copy link
    Member

    zooba commented Jan 16, 2016

    I'm good with the patch, except we can't use ctypes in the standard library. NeedCurrentDirectoryForExePathW will need to be wrapped in Modules/_winapi.c.

    Should be fairly simple to do, though it probably requires learning argument clinic (which admittedly I haven't done enough to write the function without copying from elsewhere). I may get back to it eventually, but if someone posts the implementation separately I'm happy to merge two patches to get this in.

    @tobytobkin
    Copy link
    Mannequin

    tobytobkin mannequin commented Jan 16, 2016

    Steve, yeah I saw that and started playing with Clinic last night. I'll have the updated patch out soon--I've just been slow because I've been flying around to interviews trying to get my first job out of college and it's been eating all my free time.

    @bobjalex
    Copy link
    Mannequin Author

    bobjalex mannequin commented Apr 4, 2016

    Since there seems to be ongoing work on the "which"
    function, here are a few more thoughts on this function's
    future:

    • The existing version does not prepend the current
      directory to the path if it is already in the path.
      If the current directory is in the path but is not
      the first element, it will not be the first directory
      searched. It seems that the desired behavior is to
      search the current directory first, so the current
      directory should *always* be prepended. The existing
      "which" function already has an optimization to only
      search each directory once, so it's not a problem if
      the current directory is unconditionally prepended
      and may end up in there twice. This change would
      actually be a "correction", since the doc says the
      current directory is prepended

    • The function should always return an absolute path,
      which is the behavior of the Unix(1) "which" command
      and, I think, is the typical expected behavior of a
      "which"-type request. The existing implementation
      returns a relative path in certain cases, such as if
      the file is found via a relative directory reference
      in the path. This change is not inconsistent with
      the doc, since the doc does not address it.

    • It would be nice if the extension added when the
      given command has no extension were lower case,
      instead of the upper case extension retrieved from
      the PATHEXT environment variable. Several other
      "which" implementations work that way (e.g.
      see Go's os/exec.LookPath function), producing
      a more aesthetically pleasing name, as well as
      being more consistent with naming practices of the
      last 20+ years. The shocking-looking upper case
      sxtensions remind me of VERY OLD DOS programs :-)
      This presents no conflict with the doc, and does
      not affect "correctness" since Windows file names
      are case-independent.

      A previous commenter objected to adding lower case
      extensions, but consider:

      • The current version never returns the extension
        case of the actual file. It returns the extension
        of the command string passed to the function,
        if any, otherwise it adds the extension from the
        PATHEXT environment variable, either of which
        might not match the actual file.
      • Returning the actual extension of the found file
        might be nice, but would require additional I/O;
        added expense for little return. This is not
        done in the current version.
      • The PATHEXT variable that ships with Windows
        contains the allowable extensions in upper case,
        an old DOS artifact. Few executable files these
        days have uppercase extensions. Using a lower
        case extension when the function has to invent
        one is a "modernization".
    • It would be nice if the returned file path were
      normalized. Currently, sometimes an unnormalized
      path is returned with a leading ".\".

    I did write an update to "which" with the above changes,
    and also updated the unit tests with:

    • 2 new tests to catch the bug that is the subject of
      this issue.
    • some tests were updated for the small changes such
      as normalization and lower case added extensions.
      A zip file is attached with my updates. I'm not an
      official "contributor", but feel free incorporate the
      contents in any way you deem appropriate. The files are:

    shutil.py
    updated shutil module
    shutil.py_3_5_1
    existing 3.5.1 shutil module -- basis for updates
    test_shutil_for_current_w_added_tests.py
    unit tests for existing 3.5.1 version of shutil with
    new tests to catch this bug
    test_shutil_for_new_version.py
    unit tests for attached updated version of shutil
    test_shutil_3_5_1.py
    existing 3.5.1 unit tests -- basis for updates

    Bob

    @bobjalex
    Copy link
    Mannequin Author

    bobjalex mannequin commented Apr 4, 2016

    Oops, clarification...

    I just reread my kind of long previous post, and realized it wasn't very explicit that anything concerning file extensions or prepending the current directory to the PATH apply to Windows only; not Unix (of course).

    The "returning absolute, normalized paths" suggestions are multi-platform

    I only tested the modified code on Windows.

    @vstinner
    Copy link
    Member

    This issue is not newcomer friendly, I remove the easy keyword.

    @vstinner vstinner removed the easy label Jul 29, 2019
    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 20, 2021

    Regarding Toby's patch:

    Probably _is_windows_nt_internal_command() isn't needed or desired. It's more of a command-line parsing issue, rather than a file-search issue. For example, CMD will search for an internal name if it's quoted in double quotes in the command line, such as "dir" spam, and where.exe doesn't reserve the names of CMD's internal functions. If it's kept, note that the following four names are missing from the nt_internal_commands list: 'BREAK', 'MKLINK', 'VERIFY', 'VOL'.

    The patch also makes a couple incorrect assumptions, corrected as follows:

    * A filename with an extension will be found and executed 
      even if the extension is not in PATHEXT.
    * The PATHEXT extensions are tried even if a name already
      has an extension.
    

    @eryksun eryksun added 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes labels Mar 20, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @eryksun
    Copy link
    Contributor

    eryksun commented Apr 8, 2023

    This issue was resolved by #103179.

    @eryksun eryksun closed this as completed Apr 8, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants