Skip to content
This repository was archived by the owner on Jun 26, 2023. It is now read-only.
This repository was archived by the owner on Jun 26, 2023. It is now read-only.

Switch to sharding based on estimated directory size #87

@schomatis

Description

@schomatis

The background and motivation for this is in ipfs/kubo#8106, but this is a self-contained issue.

Add an option similar to UseHAMTSharding that switches from basic to HAMT directory based on an approximated directory size.

Proposed option's name (just for the sake of this issue description; feel free to suggest any other): HAMTShardingSize.

Directory size estimation: aggregate byte length of all of BasicDirectory.ProtoNode's Links (namely their name and CID). This is only an estimation because we don't marshal/encode the underlying ProtoNode to get the exact block size (which is the motivation for the sharding in the first place) but it is close enough given the BasicDirectory doesn't use the ProtoNode's data field.

  • Optional: we can cache the estimated size as an internal variable to avoid constant recomputation.

This option will work in tandem with the global UseHAMTSharding; either of the two can trigger the HAMT transition. Any plans for the deprecation of UseHAMTSharding are outside of the scope of this issue.

Known drawbacks (inherited from current design) mentioned here just to make sure stakeholders are in sync:

  • We do not transition back from HAMT to a basic directory. Once a HAMTDirectory always a HAMTDirectory. There won't be any system of high and low watermarks: once the estimated directory size grows above HAMTShardingSize we switch and that is it.
  • There is no logic to signal to use a HAMT directory for a particular case. If the user knows from the start directory D, and only directory D, will have, say, thousands of entries and would like to make it a HAMT directory from the start to avoid the (relatively expensive) switch down the road it is forced to use the global UseHAMTSharding option for all directories, not just directory D.

The switch from basic to HAMT directory logic lives here in the MFS repo. This should actually live in UnixFS, MFS shouldn't need to know what type of directory it is manipulating, it only needs the Directory interface to mount its mutable FS (the sole objective of this layer). This is clearly evidenced by the fact that the UseHAMTSharding option itself is a UnixFS option (that go-ipfs sets directly). If we can fix this in #86 before proceeding here, we will implement the logic described here in UnixFS instead, otherwise the HAMTShardingSize will be added to the MFS layer alongside the global option in addUnixFSChild.

Metadata

Metadata

Assignees

Labels

kind/enhancementA net-new feature or improvement to an existing feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions