Many tools that scan directory trees offer an option to follow symbolic links, but with the caveat that the tool may get stuck in infinite recursion should a link loop back on itself.
This tool walks directory trees while detecting and preventing symlink recursion.
It can also be run in a simpler mode to resolve a single path or list only the immediate members of a directory.
Python 3.10 or later.
The package has no dependencies beyond the Python Standard Library.
Basic usage of the command line script would look something like this:
> python3 symlinkwalk.py --resolve=path foo
d /full/path/to/foo
The --resolve
(or -r
) option lets you set the resolving mode to one of:
path
- prints the absolute path to
foo
with any symlinks resolved
- prints the absolute path to
list
- prints a listing of the
foo
directory, with one path per line - all member paths will be absolute and fully resolved
- prints a listing of the
tree
(the default)- prints the entire directory tree rooted at
foo
- follows a depth-first traversal order
- prints the entire directory tree rooted at
Each line printed to the standard output consists of a code, a single space, and an absolute path. This code may be one of:
d
: path is to an existing directoryf
: path is to an existing non-directorym
: path is to a missing objectb
: path is to a broken symlinkr
: path is to a recursive symlinkx
: path was excluded by an--exclude
patternu#
: the same unique path was encountered#
times, where#
≥ 2- this only appears in conjunction with the
--unique-paths
option
- this only appears in conjunction with the
d
and f
lines are printed as the item is encountered, while the rest of the
codes are gathered during execution and printed at the end. While an f
path
would most likely be to a regular file (hence the 'f'), it could potentially be
something more exotic, such as a device or fifo.
An m
path, while missing at some level, is a path you ought to be able to
complete as say a directory using mkdir -p /path/to/missing/foo
.
The same cannot be said for b
or r
paths. In --resolve=path
mode, if a
broken/recursive symlink is encountered anywhere along the path you supply, the
printed path will be an absolute path to the symlink itself with anything
beyond it discarded. For example, say your path was foo/bar/baz
, of which
foo
turned out to be a broken symlink. The output would be something like b /path/to/broken/foo
. This is the path of concern you need to address, and the
bar/baz
part becomes superfluous at this point.
Regarding r
paths, symlink recursion is only detected after the symlink is
followed once. If the path walk eventually circles back to the same symlink,
it will be flagged as recursive with the r
code. If there are several
symlinks forming a loop, only one of them may get flagged, but that should be
enough to stop the infinite recursion. The point, though, is that you should be
aware that the algorithm may not exhaustively identify all recursive
symlinks.
path/to/symlinkwalk.py [-h] [-r MODE] [-x PATTERN] [-u] [TARGET ...]
TARGET
- zero or more target paths
- these may be either absolute or relative to the current working directory
- defaults to the current working directory if no targets supplied
-r MODE
or--resolve=MODE
MODE
= one ofpath
,list
, ortree
as described earlier- defaults to
tree
-x PATTERN
or--exclude=PATTERN
PATTERN
= a glob pattern that can be used to exclude certain paths- you may specify multiple
-x
arguments
-u
or--unique-paths
- normally, the same path may appear multiple times in a listing
- this may happen if you provide multiple targets that overlap
- it may also happen if multiple symlinks direct to the same location
-u
should prevent such paths appearing inf
ord
lines more than once- you can still screen for
u#
lines to see which would have done so
Note: The documentation in this section is structured like a tutorial.
If you are looking for an API reference, consult pydoc
instead.
The safe_follow_symlinks
package is built around 2 central classes, each
defined within its own module:
symlinkwalk.SymlinkWalk
support.pathref.PathRef
SymLinkWalk
implements the core functionality, but uses PathRef
exclusively
when working with paths.
This class unifies a number of common types used to represent paths by the Python Standard Library.
Its sole ref
attribute can be any of the following
path-like
types:
str
bytes
pathlib.PurePath
(and all of its subclasses includingpathlib.Path
)os.DirEntry
(the type generated by iteratingos.scandir()
)
PathRef
itself is a path-like type. It includes 2 useful properties:
path
(pathlib.Path
, read-only)- the
ref
attribute inPath
form
- the
path_or_entry
(pathlib.Path
oros.DirEntry
, read-only)- this can be useful since those two classes have a lot of APIs in common
The built-in exists()
method works much like pathlib.Path.exists()
.
You can call it on any PathRef
.
For those generated by SymlinkWalk
, you may also call:
is_broken_link()
: path is a symlink pointing to nothing that exists?exists()
will returnFalse
in this case- this is in keeping with
Path.exists()
behaviour
is_recursive_link()
: path is a symlink flagged as recursive?- note that a
False
return value does not imply it is not recursive - the
SymlinkWalk
algorithm may not have detected the recursion
- note that a
is_bad_link()
: path is a broken or recursive symlink- this simply combines the above 2 calls
is_bad_path()
: path does not exist or is a bad link
As mentioned earlier, this class implements the core functionality. Its public
attributes can be divided into those you can optionally supply through its
__init__()
method as input and those generated as output when you call one of
its primary methods.
The input args include:
path_filter
(callback): accepts/rejects paths being iteratedyield_unique
(bool
): never yield the same path twice?- multiple symlinks can sometimes point to the same place
These args influence the iterating algorithms implemented by iter_dir()
and
iter_tree()
. Suppose you only wanted to list zip files? You could write a
path filter like this:
def zip_file_filter(pr: PathRef) -> bool:
return pr.path.suffix == '.zip'
When instantiating a SymlinkWalk
, you can optionally treat it as a context
manager.
with Symlinkwalk(path_filter=zip_file_filter) as slw:
for pr in slw.iter_tree('foo'):
print(pr)
SymlinkWalk
generates a lot of state, so this ensures that it gets released
when you are done with it (or at least flagged for release by the garbage
collector).
There are 3 primary methods you can call to resolve paths.
This is actually a class method, so you need not instantiate a SymlinkWalk
to
call it. (It does so internally.)
You give it a PathRef
and it returns another PathRef
containing an absolute
version of your input with all symlinks resolved where possible.
pr = SymlinkWalk.resolve_path(PathRef('foo'))
if pr.is_bad_path():
print(f'there is a problem with the path:', pr)
It is a good idea to call some of those PathRef
methods on the result to make
sure nothing went wrong during path resolution. Alternatively, you can use the
strict=True
option to have it raise an exception in that case.
This is a generator that yields all paths in a directory in fully-resolved form, provided:
- The path is not rejected by your custom
path_filter
. - The path can, in fact, be properly resolved to an existing object.
- Either
yield_unique=False
or this the first enounter of the path.
Failing any of these, the output skipped
, bad_paths
, and path_hits
attributes will be updated with the rejected paths, respectively.
Let's say you wanted to implement your own recursive directory tree-walking
using iter_dir()
. It might look something like this:
def print_dir(dirPathRef, symLinkWalk, resolved=False):
print(dirPathRef)
for pr in symLinkWalk.iter_dir(dirPathRef, resolved):
if pr.path_or_entry.is_dir():
print_dir(pr, symLinkWalk, resolved=True)
else:
print(pr)
print_dir(PathRef('foo'))
iter_dir()
's resolved
argument defaults to False
, meaning that what you
are passing in as your PathRef
needs to be resolved before the iteration can
commence. That makes sense the first time you call your print_dir()
function,
but in recursive calls, anything that is coming out of iter_dir()
should
already be resolved. There is no need to do so again.
(Note that when resolved=True
, expand_user
is ignored.)
This works much like iter_dir()
except it walks through the whole directory
tree for you. You could reduce the earlier example to this:
for pr in symLinkWalk.iter_tree(PathRef('foo')):
print(pr)