Switch to sharding based on estimated directory size


The background and motivation for this is in https://github.com/ipfs/go-ipfs/issues/8106, but this is a self-contained issue.

Add an option similar to `UseHAMTSharding` that switches from basic to HAMT directory based on an approximated directory size.

Proposed option's name (just for the sake of this issue description; feel free to suggest any other): `HAMTShardingSize`.

Directory size estimation: aggregate byte length of all of `BasicDirectory.ProtoNode`'s `Link`s (namely their name and CID). This is only an estimation because we don't marshal/encode the underlying `ProtoNode` to get the exact block size (which is the motivation for the sharding in the first place) but it is close enough given the `BasicDirectory` doesn't use the `ProtoNode`'s data field.
* Optional: we can cache the estimated size as an internal variable to avoid constant recomputation.

This option will work in tandem with the global `UseHAMTSharding`; either of the two can trigger the HAMT transition. Any plans for the deprecation of `UseHAMTSharding` are *outside* of the scope of this issue.

Known drawbacks (inherited from current design) mentioned here just to make sure stakeholders are in sync:
* We do not transition back from HAMT to a basic directory. Once a `HAMTDirectory` *always* a `HAMTDirectory`. There won't be any system of high and low watermarks: once the estimated directory size grows above `HAMTShardingSize` we switch and that is it.
* There is no logic to signal to use a HAMT directory for a particular case. If the user knows from the start directory D, and only directory D, will have, say, thousands of entries and would like to make it a HAMT directory from the start to avoid the (relatively expensive) switch down the road it is forced to use the global `UseHAMTSharding` option for *all* directories, not just directory D.

------------------------------------------------------

The switch from basic to HAMT directory logic lives here in the MFS repo. This should actually live in UnixFS, MFS shouldn't need to know what type of directory it is manipulating, it only needs the `Directory` interface to mount its mutable FS (the sole objective of this layer). This is clearly evidenced by the fact that the `UseHAMTSharding` option itself is a UnixFS option (that `go-ipfs` sets directly). If we can fix this in https://github.com/ipfs/go-mfs/issues/86 before proceeding here, we will implement the logic described here in UnixFS instead, otherwise the `HAMTShardingSize` will be added to the MFS layer alongside the global option in `addUnixFSChild`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch to sharding based on estimated directory size #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Switch to sharding based on estimated directory size #87

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions