Skip to content

Commit c1e9298

Browse files
authored
gh-90385: Add pathlib.Path.walk() method (GH-92517)
Automerge-Triggered-By: GH:brettcannon
1 parent e4d3a96 commit c1e9298

File tree

5 files changed

+338
-1
lines changed

5 files changed

+338
-1
lines changed

Doc/library/pathlib.rst

+96
Original file line numberDiff line numberDiff line change
@@ -946,6 +946,101 @@ call fails (for example because the path doesn't exist).
946946
to the directory after creating the iterator, whether a path object for
947947
that file be included is unspecified.
948948

949+
.. method:: Path.walk(top_down=True, on_error=None, follow_symlinks=False)
950+
951+
Generate the file names in a directory tree by walking the tree
952+
either top-down or bottom-up.
953+
954+
For each directory in the directory tree rooted at *self* (including
955+
*self* but excluding '.' and '..'), the method yields a 3-tuple of
956+
``(dirpath, dirnames, filenames)``.
957+
958+
*dirpath* is a :class:`Path` to the directory currently being walked,
959+
*dirnames* is a list of strings for the names of subdirectories in *dirpath*
960+
(excluding ``'.'`` and ``'..'``), and *filenames* is a list of strings for
961+
the names of the non-directory files in *dirpath*. To get a full path
962+
(which begins with *self*) to a file or directory in *dirpath*, do
963+
``dirpath / name``. Whether or not the lists are sorted is file
964+
system-dependent.
965+
966+
If the optional argument *top_down* is true (which is the default), the triple for a
967+
directory is generated before the triples for any of its subdirectories
968+
(directories are walked top-down). If *top_down* is false, the triple
969+
for a directory is generated after the triples for all of its subdirectories
970+
(directories are walked bottom-up). No matter the value of *top_down*, the
971+
list of subdirectories is retrieved before the triples for the directory and
972+
its subdirectories are walked.
973+
974+
When *top_down* is true, the caller can modify the *dirnames* list in-place
975+
(for example, using :keyword:`del` or slice assignment), and :meth:`Path.walk`
976+
will only recurse into the subdirectories whose names remain in *dirnames*.
977+
This can be used to prune the search, or to impose a specific order of visiting,
978+
or even to inform :meth:`Path.walk` about directories the caller creates or
979+
renames before it resumes :meth:`Path.walk` again. Modifying *dirnames* when
980+
*top_down* is false has no effect on the behavior of :meth:`Path.walk()` since the
981+
directories in *dirnames* have already been generated by the time *dirnames*
982+
is yielded to the caller.
983+
984+
By default, errors from :func:`os.scandir` are ignored. If the optional
985+
argument *on_error* is specified, it should be a callable; it will be
986+
called with one argument, an :exc:`OSError` instance. The callable can handle the
987+
error to continue the walk or re-raise it to stop the walk. Note that the
988+
filename is available as the ``filename`` attribute of the exception object.
989+
990+
By default, :meth:`Path.walk` does not follow symbolic links, and instead adds them
991+
to the *filenames* list. Set *follow_symlinks* to true to resolve symlinks
992+
and place them in *dirnames* and *filenames* as appropriate for their targets, and
993+
consequently visit directories pointed to by symlinks (where supported).
994+
995+
.. note::
996+
997+
Be aware that setting *follow_symlinks* to true can lead to infinite
998+
recursion if a link points to a parent directory of itself. :meth:`Path.walk`
999+
does not keep track of the directories it has already visited.
1000+
1001+
.. note::
1002+
:meth:`Path.walk` assumes the directories it walks are not modified during
1003+
execution. For example, if a directory from *dirnames* has been replaced
1004+
with a symlink and *follow_symlinks* is false, :meth:`Path.walk` will
1005+
still try to descend into it. To prevent such behavior, remove directories
1006+
from *dirnames* as appropriate.
1007+
1008+
.. note::
1009+
1010+
Unlike :func:`os.walk`, :meth:`Path.walk` lists symlinks to directories in
1011+
*filenames* if *follow_symlinks* is false.
1012+
1013+
This example displays the number of bytes used by all files in each directory,
1014+
while ignoring ``__pycache__`` directories::
1015+
1016+
from pathlib import Path
1017+
for root, dirs, files in Path("cpython/Lib/concurrent").walk(on_error=print):
1018+
print(
1019+
root,
1020+
"consumes",
1021+
sum((root / file).stat().st_size for file in files),
1022+
"bytes in",
1023+
len(files),
1024+
"non-directory files"
1025+
)
1026+
if '__pycache__' in dirs:
1027+
dirs.remove('__pycache__')
1028+
1029+
This next example is a simple implementation of :func:`shutil.rmtree`.
1030+
Walking the tree bottom-up is essential as :func:`rmdir` doesn't allow
1031+
deleting a directory before it is empty::
1032+
1033+
# Delete everything reachable from the directory "top".
1034+
# CAUTION: This is dangerous! For example, if top == Path('/'),
1035+
# it could delete all of your files.
1036+
for root, dirs, files in top.walk(topdown=False):
1037+
for name in files:
1038+
(root / name).unlink()
1039+
for name in dirs:
1040+
(root / name).rmdir()
1041+
1042+
.. versionadded:: 3.12
1043+
9491044
.. method:: Path.lchmod(mode)
9501045

9511046
Like :meth:`Path.chmod` but, if the path points to a symbolic link, the
@@ -1285,6 +1380,7 @@ Below is a table mapping various :mod:`os` functions to their corresponding
12851380
:func:`os.path.expanduser` :meth:`Path.expanduser` and
12861381
:meth:`Path.home`
12871382
:func:`os.listdir` :meth:`Path.iterdir`
1383+
:func:`os.walk` :meth:`Path.walk`
12881384
:func:`os.path.isdir` :meth:`Path.is_dir`
12891385
:func:`os.path.isfile` :meth:`Path.is_file`
12901386
:func:`os.path.islink` :meth:`Path.is_symlink`

Lib/pathlib.py

+43
Original file line numberDiff line numberDiff line change
@@ -1321,6 +1321,49 @@ def expanduser(self):
13211321

13221322
return self
13231323

1324+
def walk(self, top_down=True, on_error=None, follow_symlinks=False):
1325+
"""Walk the directory tree from this directory, similar to os.walk()."""
1326+
sys.audit("pathlib.Path.walk", self, on_error, follow_symlinks)
1327+
return self._walk(top_down, on_error, follow_symlinks)
1328+
1329+
def _walk(self, top_down, on_error, follow_symlinks):
1330+
# We may not have read permission for self, in which case we can't
1331+
# get a list of the files the directory contains. os.walk
1332+
# always suppressed the exception then, rather than blow up for a
1333+
# minor reason when (say) a thousand readable directories are still
1334+
# left to visit. That logic is copied here.
1335+
try:
1336+
scandir_it = self._scandir()
1337+
except OSError as error:
1338+
if on_error is not None:
1339+
on_error(error)
1340+
return
1341+
1342+
with scandir_it:
1343+
dirnames = []
1344+
filenames = []
1345+
for entry in scandir_it:
1346+
try:
1347+
is_dir = entry.is_dir(follow_symlinks=follow_symlinks)
1348+
except OSError:
1349+
# Carried over from os.path.isdir().
1350+
is_dir = False
1351+
1352+
if is_dir:
1353+
dirnames.append(entry.name)
1354+
else:
1355+
filenames.append(entry.name)
1356+
1357+
if top_down:
1358+
yield self, dirnames, filenames
1359+
1360+
for dirname in dirnames:
1361+
dirpath = self._make_child_relpath(dirname)
1362+
yield from dirpath._walk(top_down, on_error, follow_symlinks)
1363+
1364+
if not top_down:
1365+
yield self, dirnames, filenames
1366+
13241367

13251368
class PosixPath(Path, PurePosixPath):
13261369
"""Path subclass for non-Windows systems.

Lib/test/support/os_helper.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -572,7 +572,7 @@ def fs_is_case_insensitive(directory):
572572

573573

574574
class FakePath:
575-
"""Simple implementing of the path protocol.
575+
"""Simple implementation of the path protocol.
576576
"""
577577
def __init__(self, path):
578578
self.path = path

Lib/test/test_pathlib.py

+197
Original file line numberDiff line numberDiff line change
@@ -2478,6 +2478,203 @@ def test_complex_symlinks_relative(self):
24782478
def test_complex_symlinks_relative_dot_dot(self):
24792479
self._check_complex_symlinks(os.path.join('dirA', '..'))
24802480

2481+
class WalkTests(unittest.TestCase):
2482+
2483+
def setUp(self):
2484+
self.addCleanup(os_helper.rmtree, os_helper.TESTFN)
2485+
2486+
# Build:
2487+
# TESTFN/
2488+
# TEST1/ a file kid and two directory kids
2489+
# tmp1
2490+
# SUB1/ a file kid and a directory kid
2491+
# tmp2
2492+
# SUB11/ no kids
2493+
# SUB2/ a file kid and a dirsymlink kid
2494+
# tmp3
2495+
# SUB21/ not readable
2496+
# tmp5
2497+
# link/ a symlink to TEST2
2498+
# broken_link
2499+
# broken_link2
2500+
# broken_link3
2501+
# TEST2/
2502+
# tmp4 a lone file
2503+
self.walk_path = pathlib.Path(os_helper.TESTFN, "TEST1")
2504+
self.sub1_path = self.walk_path / "SUB1"
2505+
self.sub11_path = self.sub1_path / "SUB11"
2506+
self.sub2_path = self.walk_path / "SUB2"
2507+
sub21_path= self.sub2_path / "SUB21"
2508+
tmp1_path = self.walk_path / "tmp1"
2509+
tmp2_path = self.sub1_path / "tmp2"
2510+
tmp3_path = self.sub2_path / "tmp3"
2511+
tmp5_path = sub21_path / "tmp3"
2512+
self.link_path = self.sub2_path / "link"
2513+
t2_path = pathlib.Path(os_helper.TESTFN, "TEST2")
2514+
tmp4_path = pathlib.Path(os_helper.TESTFN, "TEST2", "tmp4")
2515+
broken_link_path = self.sub2_path / "broken_link"
2516+
broken_link2_path = self.sub2_path / "broken_link2"
2517+
broken_link3_path = self.sub2_path / "broken_link3"
2518+
2519+
os.makedirs(self.sub11_path)
2520+
os.makedirs(self.sub2_path)
2521+
os.makedirs(sub21_path)
2522+
os.makedirs(t2_path)
2523+
2524+
for path in tmp1_path, tmp2_path, tmp3_path, tmp4_path, tmp5_path:
2525+
with open(path, "x", encoding='utf-8') as f:
2526+
f.write(f"I'm {path} and proud of it. Blame test_pathlib.\n")
2527+
2528+
if os_helper.can_symlink():
2529+
os.symlink(os.path.abspath(t2_path), self.link_path)
2530+
os.symlink('broken', broken_link_path, True)
2531+
os.symlink(pathlib.Path('tmp3', 'broken'), broken_link2_path, True)
2532+
os.symlink(pathlib.Path('SUB21', 'tmp5'), broken_link3_path, True)
2533+
self.sub2_tree = (self.sub2_path, ["SUB21"],
2534+
["broken_link", "broken_link2", "broken_link3",
2535+
"link", "tmp3"])
2536+
else:
2537+
self.sub2_tree = (self.sub2_path, ["SUB21"], ["tmp3"])
2538+
2539+
if not is_emscripten:
2540+
# Emscripten fails with inaccessible directories.
2541+
os.chmod(sub21_path, 0)
2542+
try:
2543+
os.listdir(sub21_path)
2544+
except PermissionError:
2545+
self.addCleanup(os.chmod, sub21_path, stat.S_IRWXU)
2546+
else:
2547+
os.chmod(sub21_path, stat.S_IRWXU)
2548+
os.unlink(tmp5_path)
2549+
os.rmdir(sub21_path)
2550+
del self.sub2_tree[1][:1]
2551+
2552+
def test_walk_topdown(self):
2553+
all = list(self.walk_path.walk())
2554+
2555+
self.assertEqual(len(all), 4)
2556+
# We can't know which order SUB1 and SUB2 will appear in.
2557+
# Not flipped: TESTFN, SUB1, SUB11, SUB2
2558+
# flipped: TESTFN, SUB2, SUB1, SUB11
2559+
flipped = all[0][1][0] != "SUB1"
2560+
all[0][1].sort()
2561+
all[3 - 2 * flipped][-1].sort()
2562+
all[3 - 2 * flipped][1].sort()
2563+
self.assertEqual(all[0], (self.walk_path, ["SUB1", "SUB2"], ["tmp1"]))
2564+
self.assertEqual(all[1 + flipped], (self.sub1_path, ["SUB11"], ["tmp2"]))
2565+
self.assertEqual(all[2 + flipped], (self.sub11_path, [], []))
2566+
self.assertEqual(all[3 - 2 * flipped], self.sub2_tree)
2567+
2568+
def test_walk_prune(self, walk_path=None):
2569+
if walk_path is None:
2570+
walk_path = self.walk_path
2571+
# Prune the search.
2572+
all = []
2573+
for root, dirs, files in walk_path.walk():
2574+
all.append((root, dirs, files))
2575+
if 'SUB1' in dirs:
2576+
# Note that this also mutates the dirs we appended to all!
2577+
dirs.remove('SUB1')
2578+
2579+
self.assertEqual(len(all), 2)
2580+
self.assertEqual(all[0], (self.walk_path, ["SUB2"], ["tmp1"]))
2581+
2582+
all[1][-1].sort()
2583+
all[1][1].sort()
2584+
self.assertEqual(all[1], self.sub2_tree)
2585+
2586+
def test_file_like_path(self):
2587+
self.test_walk_prune(FakePath(self.walk_path).__fspath__())
2588+
2589+
def test_walk_bottom_up(self):
2590+
all = list(self.walk_path.walk( top_down=False))
2591+
2592+
self.assertEqual(len(all), 4, all)
2593+
# We can't know which order SUB1 and SUB2 will appear in.
2594+
# Not flipped: SUB11, SUB1, SUB2, TESTFN
2595+
# flipped: SUB2, SUB11, SUB1, TESTFN
2596+
flipped = all[3][1][0] != "SUB1"
2597+
all[3][1].sort()
2598+
all[2 - 2 * flipped][-1].sort()
2599+
all[2 - 2 * flipped][1].sort()
2600+
self.assertEqual(all[3],
2601+
(self.walk_path, ["SUB1", "SUB2"], ["tmp1"]))
2602+
self.assertEqual(all[flipped],
2603+
(self.sub11_path, [], []))
2604+
self.assertEqual(all[flipped + 1],
2605+
(self.sub1_path, ["SUB11"], ["tmp2"]))
2606+
self.assertEqual(all[2 - 2 * flipped],
2607+
self.sub2_tree)
2608+
2609+
@os_helper.skip_unless_symlink
2610+
def test_walk_follow_symlinks(self):
2611+
walk_it = self.walk_path.walk(follow_symlinks=True)
2612+
for root, dirs, files in walk_it:
2613+
if root == self.link_path:
2614+
self.assertEqual(dirs, [])
2615+
self.assertEqual(files, ["tmp4"])
2616+
break
2617+
else:
2618+
self.fail("Didn't follow symlink with follow_symlinks=True")
2619+
2620+
def test_walk_symlink_location(self):
2621+
# Tests whether symlinks end up in filenames or dirnames depending
2622+
# on the `follow_symlinks` argument.
2623+
walk_it = self.walk_path.walk(follow_symlinks=False)
2624+
for root, dirs, files in walk_it:
2625+
if root == self.sub2_path:
2626+
self.assertIn("link", files)
2627+
break
2628+
else:
2629+
self.fail("symlink not found")
2630+
2631+
walk_it = self.walk_path.walk(follow_symlinks=True)
2632+
for root, dirs, files in walk_it:
2633+
if root == self.sub2_path:
2634+
self.assertIn("link", dirs)
2635+
break
2636+
2637+
def test_walk_bad_dir(self):
2638+
errors = []
2639+
walk_it = self.walk_path.walk(on_error=errors.append)
2640+
root, dirs, files = next(walk_it)
2641+
self.assertEqual(errors, [])
2642+
dir1 = 'SUB1'
2643+
path1 = root / dir1
2644+
path1new = (root / dir1).with_suffix(".new")
2645+
path1.rename(path1new)
2646+
try:
2647+
roots = [r for r, _, _ in walk_it]
2648+
self.assertTrue(errors)
2649+
self.assertNotIn(path1, roots)
2650+
self.assertNotIn(path1new, roots)
2651+
for dir2 in dirs:
2652+
if dir2 != dir1:
2653+
self.assertIn(root / dir2, roots)
2654+
finally:
2655+
path1new.rename(path1)
2656+
2657+
def test_walk_many_open_files(self):
2658+
depth = 30
2659+
base = pathlib.Path(os_helper.TESTFN, 'deep')
2660+
path = pathlib.Path(base, *(['d']*depth))
2661+
path.mkdir(parents=True)
2662+
2663+
iters = [base.walk(top_down=False) for _ in range(100)]
2664+
for i in range(depth + 1):
2665+
expected = (path, ['d'] if i else [], [])
2666+
for it in iters:
2667+
self.assertEqual(next(it), expected)
2668+
path = path.parent
2669+
2670+
iters = [base.walk(top_down=True) for _ in range(100)]
2671+
path = base
2672+
for i in range(depth + 1):
2673+
expected = (path, ['d'] if i < depth else [], [])
2674+
for it in iters:
2675+
self.assertEqual(next(it), expected)
2676+
path = path / 'd'
2677+
24812678

24822679
class PathTest(_BasePathTest, unittest.TestCase):
24832680
cls = pathlib.Path
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add :meth:`pathlib.Path.walk` as an alternative to :func:`os.walk`.

0 commit comments

Comments
 (0)