-
Notifications
You must be signed in to change notification settings - Fork 25
Switch to sharding based on estimated directory size #87
Description
The background and motivation for this is in ipfs/kubo#8106, but this is a self-contained issue.
Add an option similar to UseHAMTSharding
that switches from basic to HAMT directory based on an approximated directory size.
Proposed option's name (just for the sake of this issue description; feel free to suggest any other): HAMTShardingSize
.
Directory size estimation: aggregate byte length of all of BasicDirectory.ProtoNode
's Link
s (namely their name and CID). This is only an estimation because we don't marshal/encode the underlying ProtoNode
to get the exact block size (which is the motivation for the sharding in the first place) but it is close enough given the BasicDirectory
doesn't use the ProtoNode
's data field.
- Optional: we can cache the estimated size as an internal variable to avoid constant recomputation.
This option will work in tandem with the global UseHAMTSharding
; either of the two can trigger the HAMT transition. Any plans for the deprecation of UseHAMTSharding
are outside of the scope of this issue.
Known drawbacks (inherited from current design) mentioned here just to make sure stakeholders are in sync:
- We do not transition back from HAMT to a basic directory. Once a
HAMTDirectory
always aHAMTDirectory
. There won't be any system of high and low watermarks: once the estimated directory size grows aboveHAMTShardingSize
we switch and that is it. - There is no logic to signal to use a HAMT directory for a particular case. If the user knows from the start directory D, and only directory D, will have, say, thousands of entries and would like to make it a HAMT directory from the start to avoid the (relatively expensive) switch down the road it is forced to use the global
UseHAMTSharding
option for all directories, not just directory D.
The switch from basic to HAMT directory logic lives here in the MFS repo. This should actually live in UnixFS, MFS shouldn't need to know what type of directory it is manipulating, it only needs the Directory
interface to mount its mutable FS (the sole objective of this layer). This is clearly evidenced by the fact that the UseHAMTSharding
option itself is a UnixFS option (that go-ipfs
sets directly). If we can fix this in #86 before proceeding here, we will implement the logic described here in UnixFS instead, otherwise the HAMTShardingSize
will be added to the MFS layer alongside the global option in addUnixFSChild
.