Skip to content

Some ntpath/posixpath functions unusable on foreign OS #119671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
barneygale opened this issue May 28, 2024 · 9 comments
Open

Some ntpath/posixpath functions unusable on foreign OS #119671

barneygale opened this issue May 28, 2024 · 9 comments
Labels
docs Documentation in the Doc dir

Comments

@barneygale
Copy link
Contributor

barneygale commented May 28, 2024

The os.path docs intro includes:

Note: Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:

  • posixpath for UNIX-style paths
  • ntpath for Windows paths

But we don't explain that some of the functions in ntpath and posixpath aren't suitable for usage on a "foreign" OS, e.g. that using ntpath.realpath() from POSIX is a bad idea.

These functions are safe because they do purely lexical work:

  • basename
  • commonpath
  • commonprefix
  • dirname
  • isabs
  • join
  • normcase
  • normpath
  • split
  • splitdrive
  • splitroot
  • splitext

These functions are probably unsafe because they rely on details of the host OS (e.g. via the os module):

  • abspath
  • exists
  • lexists
  • expanduser
  • expandvars
  • getatime
  • getmtime
  • getctime
  • getsize
  • isfile
  • isdir
  • isjunction
  • islink
  • ismount
  • isdevdrive
  • realpath
  • relpath
  • samefile
  • sameopenfile
  • samestat
  • supports_unicode_filenames

Linked PRs

@barneygale
Copy link
Contributor Author

I'm not sure if/how to solve this.

It could be considered a docs bug - perhaps we need to adjust the os.path docs to only recommend ntpath and posixpath in certain circumstances, and document the safe interface.

Alternatively we could remove the docs mention of the ntpath and posixpath modules (making them quasi-private again), and instead add os.PosixPathParser and os.WindowsPathParser classes that expose only the safe methods. There's something similar in the pathlib ABCs already:

class ParserBase:
"""Base class for path parsers, which do low-level path manipulation.
Path parsers provide a subset of the os.path API, specifically those
functions needed to provide PurePathBase functionality. Each PurePathBase
subclass references its path parser via a 'parser' class attribute.
Every method in this base class raises an UnsupportedOperation exception.
"""
@classmethod
def _unsupported_msg(cls, attribute):
return f"{cls.__name__}.{attribute} is unsupported"
@property
def sep(self):
"""The character used to separate path components."""
raise UnsupportedOperation(self._unsupported_msg('sep'))
def join(self, path, *paths):
"""Join path segments."""
raise UnsupportedOperation(self._unsupported_msg('join()'))
def split(self, path):
"""Split the path into a pair (head, tail), where *head* is everything
before the final path separator, and *tail* is everything after.
Either part may be empty.
"""
raise UnsupportedOperation(self._unsupported_msg('split()'))
def splitdrive(self, path):
"""Split the path into a 2-item tuple (drive, tail), where *drive* is
a device name or mount point, and *tail* is everything after the
drive. Either part may be empty."""
raise UnsupportedOperation(self._unsupported_msg('splitdrive()'))
def splitext(self, path):
"""Split the path into a pair (root, ext), where *ext* is empty or
begins with a begins with a period and contains at most one period,
and *root* is everything before the extension."""
raise UnsupportedOperation(self._unsupported_msg('splitext()'))
def normcase(self, path):
"""Normalize the case of the path."""
raise UnsupportedOperation(self._unsupported_msg('normcase()'))
def isabs(self, path):
"""Returns whether the path is absolute, i.e. unaffected by the
current directory or drive."""
raise UnsupportedOperation(self._unsupported_msg('isabs()'))

@nineteendo
Copy link
Contributor

Also, it's worth noting that there's no problem with relpath(abs1, abs2).

@nineteendo
Copy link
Contributor

nineteendo commented May 28, 2024

Could we maybe move the safe functions to ntpath.pure & posixpath.pure, re-exporting them in ntpath.__init__.py & posixpath.__init__.py for backwards compatibility? Or is *.abc a better name?

@barneygale
Copy link
Contributor Author

barneygale commented May 28, 2024

Also, it's worth noting that there's no problem with relpath(abs1, abs2).

Sure, but we're interested in whether functions are safe overall, not under specific conditions.

@nineteendo
Copy link
Contributor

nineteendo commented May 28, 2024

It might be a little annoying having to import it from the other module though (and a type checker might complain about it):

from ntpath import relpath  # Do not import `ntpath`, use `os.path` or `ntpath.pure`.
from ntpath.pure import *

Maybe we could define ntpath.relpath() & ntpath.pure.relpath() separately, with the latter requiring 2 arguments and not supporting absolute paths? ntpath.relpath() could then simply call ntpath.pure.relpath() after making the paths absolute.

@nineteendo
Copy link
Contributor

Could we discuss adding a submodule on Discourse? I was asked to not create new idea threads until 2025.

@barneygale
Copy link
Contributor Author

TBH I consider this a pretty low-priority issue and not worth raising on discourse. It's been this way for a long time. If/when other core devs encounter this issue, they might search the bug tracker, arrive here, and give their tuppence.

@zooba
Copy link
Member

zooba commented Jun 17, 2024

I don't think this is worth adding submodules. Fallback implementations or raise NotImplementedError for those that are really unusable ought to be enough. Availability should already be documented, but we can fix those if necessary (and static analysis tools could/should use that information to provide warnings that we don't).

We've got little to no chance of moving any existing functions into new modules, so no compatibility issues are resolved. But I think we can justify replacing unexplained OSError/AttributeErrors with NotImplementedError (or a NotSupportedError if we have one?) across a set of functions.

@nineteendo
Copy link
Contributor

Note that even this conservative check doesn't work for relpath() as zipfile.Path.relative_to() relies on this handling the garbage on Windows, which I really don't like.

diff --git a/Lib/posixpath.py b/Lib/posixpath.py
index fccca4e066b..dd431c4dbf7 100644
--- a/Lib/posixpath.py
+++ b/Lib/posixpath.py
@@ -516,8 +516,13 @@ def relpath(path, start=None):
         start = os.fspath(start)
 
     try:
-        start_tail = abspath(start).lstrip(sep)
-        path_tail = abspath(path).lstrip(sep)
+        start_abs = abspath(start)
+        path_abs = abspath(path)
+        if not isabs(start_abs) or not isabs(path_abs):
+            raise ValueError("paths are not absolute")
+
+        start_tail = start_abs.lstrip(sep)
+        path_tail = path_abs.lstrip(sep)
         start_list = start_tail.split(sep) if start_tail else []
         path_list = path_tail.split(sep) if path_tail else []
         # Work out how much of the filepath is shared by start and path.

At least the submodules could make this work without breaking anyone's code. And all safe functions can be refactored to them, making the code easier to maintain. Other approaches actually seem more complicated to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

3 participants