-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-46227: Add pathlib.Path.walk method #30340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-46227: Add pathlib.Path.walk method #30340
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting a PR!
Please, note that testing newly added code is probably a must. Adding tests will increase your chances to get this new feature accepted.
If you have any problems with CLA (which is also required), please, feel free to ping me - I will try to help you 🙂
Misc/NEWS.d/next/Library/2022-01-02-21-41-52.bpo-46277.NquRZl.rst
Outdated
Show resolved
Hide resolved
I'd support adding some way to walk a directory hierarchy in pathlib. But I'm not a huge fan of the Here's a possible API for discussion, still using diff --git a/Lib/pathlib.py b/Lib/pathlib.py
index f1a33178e2..23de30abc6 100644
--- a/Lib/pathlib.py
+++ b/Lib/pathlib.py
@@ -1008,15 +1008,19 @@ def samefile(self, other_path):
other_st = self._accessor.stat(other_path)
return os.path.samestat(st, other_st)
- def iterdir(self):
+ def iterdir(self, recurse=False, dirs=True, files=True):
"""Iterate over the files in this directory. Does not yield any
result for the special paths '.' and '..'.
"""
- for name in self._accessor.listdir(self):
- if name in {'.', '..'}:
- # Yielding a path object for these makes little sense
- continue
- yield self._make_child_relpath(name)
+ for root, dirnames, filenames in os.walk(self):
+ if dirs:
+ for name in dirnames:
+ yield type(self)(root, name)
+ if files:
+ for name in filenames:
+ yield type(self)(root, name)
+ if not recurse:
+ break
def glob(self, pattern):
"""Iterate over this subtree and yield all existing files (of any |
@barneygale I agree that os.walk API seems pretty clunky. I do, however, believe that the one you're offering is basically So I believe that we should not add walk's logic into iterdir. We could make something like iterwalk that does all the same things as os.walk with a bit of a simpler api. However, I am not sure how to simplify os.walk API without losing its core features. There's the issue that os.walk provides you with dir-by-dir iteration by yielding root which is pretty useful in many situations while your solution does not. Also, os.walk distinguishes between dirs and files so you don't have to and provides you with a simple way to prevent recursion into certain subtrees by removing elements from dirs list, which is pretty hard to get with any simpler APIs. I will commit the changes that add this feature into my implementation tomorrow. |
Thanks. I didn't know you could edit the list of directories to prevent recursion - neat! One more thing to consider: in part for performance reasons, pathlib internally represents a path as a drive, root, and list of string parts. This avoids some costly splitting and joining on directory separators -- for example, Pathlib has historically re-implemented algorithms like In the last couple releases I've removed pathlib's own In that case of your patch, the key line is I suspect this is all fine, and that practicality beats purity here. |
My implementation certainly requires quite a bit of review -- I've never commited to pathlib or to CPython in general. Implementation-wise, I could just reimplement os.walk or just fix the issues in my implementation -- either way is fine with me. But I think the most important issue at hand is the API for the Path.walk. |
Added the changes I've mentioned. All the features of os.walk are supported now. Let's move our discussion here: |
I will add the tests soon. If anyone has any ideas about what tests we need specifically, I'd love to hear them. |
This PR is stale because it has been open for 30 days with no activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some text to docs, please?
Otherwise how will users know this is available and how to use.
ebf6811
to
a82baed
Compare
Every change to Python requires a NEWS entry. Please, add it using the blurb_it Web app or the blurb command-line tool. |
https://bugs.python.org/issue46227