Skip to content

RF idea: deprecate same_names in favor of a more generic layout parameter #555

Open
@yarikoptic

Description

@yarikoptic

ATM CachingFileSystem has a single bool option same_names to switch layout of files from /hash to /url-filename and thus does not leave room for "improvement":

Under heavy use of the cache use having a flat tree of files (/hash or /url-filename based) could lead to a very heavy directory so filesystem could become inefficient in listing that directory etc.

  • A common (look under .git/objects, same approach used by git-annex, girder etc) workaround is to establish leading directories, e.g. for a /hash it could be /hash[:2]/hash[2:4]/hash[4:] path to the file, thus reducing impact on the file system
  • for url-based path, it could simply be a path constructed from URI components, e.g. for http://domain/p1/p2/filename URL it could become http/domain/p1/p2/filename path, thus allowing to disambiguate between file systems etc, and also avoid conflicts for the same common filename (as I guess would be now with same_names=True).

With above in mind, I think it would have been nice if instead of same_names there was a layout={hash,hashtree,url_filename,url_fullpath} or alike, thus allowing users to switch to most appropriate layout depending on their use case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions