UnixFS Reboot

# TLDR;

I’ve listed every feature I can find that has been considered for UnixFSv2 below. We discussed this in a short meeting (notes at the end of the document, recording posted soon) and the following action items surfaced:

* @mikeal will kick off [an issue in `ipfs/spec`](https://github.com/ipfs/specs/issues/217) to add file metadata to **UnixFSv1**
* @mikeal will kick off an issue in this repo to define and scope a *UnixFSv2** we can ship on a reasonable timeline.

# UnixFS vNext Reboot
		
For some time we’ve been directing issues, feature requests, and the general future of UnixFS at “UnixFSv2.” Since the size and scope of this future version were never locked down this has delayed improvements to UnixFSv1 and has failed to tie UnixFSv2 to a clear deadline and set of functionality.

The goal of this document is to describe the various issues and features we’d like to see in UnixFS and link to the historical discussions about those features. We can then use this document to discuss and prioritize each feature and find the best path to development whether it be improvements to UnixFSv1, an incremental UnixFSv2 on `dag-cbor`, or a bigger future version built on features that are still being researched.

## General Links

* [Requirements 2017](https://github.com/ipfs/unixfs-v2/issues/1)
* [UnixFSv1 -> v2 upgrade path](https://github.com/ipfs/unixfs-v2/issues/24)
* [Prioritizing UnixFSv2](https://github.com/ipfs/roadmap/issues/19)
* [UnixFSv2 Draft Implementation in JS](https://github.com/ipld/js-unixfsv2)

## Development Targets

This section briefly describes the difficulties and limitations of different development strategies which should help inform how to best approach solving each issues.

### Improvements to UnixFSv1

One problem with improving UnixFSv1 is that every *generic* improvement we make cannot be leveraged by other applications outside of IPFS. For instance, the work we’ve done for directory sharding lives in UnixFSv1 and can’t be used for other  generic sharding problems. This means that solving fairly generic problems via UnixFSv1 is less valuable and eventually duplicated effort.

The other problem is `dag-pb`, [best summarized by @stebalian](https://github.com/ipfs/unixfs-v2/issues/1#issuecomment-338301334). In short, it’s very rigid and adding fields and other features are more cumbersome than `dag-cbor`.

### UnixFSv2 on `dag-cbor` soonish

This development route solves the `dag-pb` related issues and makes *some* of the generic improvements leveragable outside of IPFS.

However, there is one major problem remaining: upgradability. All new features and improvements must exist and be relatively consistent between two versions of IPFS manipulating the same data. There is no good way to ensure this without future IPLD features that are still in the research phase.

This route of development is most problematic when tackling the “Reproducible Hashes” issue.

It should also be noted that, given we know that there is future un-developed IPLD work that we want to leverage for UnixFS we have a high degree of certainty that if we were to release this version of UnixFSv2 that we would still at some point in the future have another major version migration as well.

The actual development time for this would not be very long. @mikeal has already written draft implementations of several iterations of the UnixFSv2 spec in JS. A much more important factor to consider is the upgrade cost to IPFS users.

### UnixFSv2 on “IPLD Future”

Most of the big problems facing UnixFS are problems facing IPLD generally. These problems are all being actively worked on in the form of engineering and research and at some future date can be leveraged for an ideal, future-proof (upgradable), version of UnixFS. However, *when* this will be available can’t be predicted with a high level of certainty.

# Issues

## Standard File/Directory metadata

* [Permissions](https://github.com/ipfs/unixfs-v2/issues/14)
  * Executable bit
  * Ownership (user and group)
* Filename *in* file object
* Number of files in directory (HAMT)
* [Cumulative size of files in directory](https://github.com/ipfs/unixfs-v2/issues/7)
* mtime
* mtime as BigInt
* [content-type](https://github.com/ipfs/unixfs-v2/issues/11)
 
### Links
* [Comment Sept 2017](https://github.com/ipfs/unixfs-v2/issues/1#issuecomment-330285604)

## Arbitrary file metadata

The ability for users to add their own optional metadata to files could be very useful. However, doing *arbitrary anything* in `dag-pb` is problematic.

* [Add metadata field](https://github.com/ipfs/unixfs-v2/issues/22)

## Reproducible Hashing

Put simply, this is the ability for a given UnixFS implementation to look at an existing UnixFS encoded file and a file on a traditional file system and to reproduce the UnixFS encode identically.

This feature is relatively simple if there is no optionality and every version of IPFS is in perfect alignment. However, this is almost never the case.

IPFS has several options that can be used when encoding a file that alter the encode.

One path is to encode all options into the encoded version of the file. This would work as long as both versions of IPFS are in alignment, which means this can fail to produce identical hashes often in new upgrade scenarios. The only to way to *completely* guarantee reproducible hashing is to have a guarantee that the *applications* are also identical but this is very difficult without “IPLD Future.” 

* [Reproducible file imports, Sept 2018](https://github.com/ipfs/unixfs-v2/issues/15)
* [Deep Dive IPFSCamp 2019: Deterministic CIDs! Reproducible File Imports! Verifiable HTTP Gateways!](https://github.com/ipfs/camp/blob/dccfa742e3fc8bc94f747f6202f55d428a0f3e6a/DEEP_DIVES/35-deterministic-cids-reproducible-file-imports-verifiable-http-gateways.md)

## “Inline” files and directories

For small files and directories the benefits of de-duplication are often out-weighed by the cost of retrieving additional blocks.

There are also use cases, like websites, where it may be highly beneficial to inline certain data into the root block of the directory tree for faster early rendering.

* [Inlineing small files, Nov 2017](https://github.com/ipfs/unixfs-v2/issues/4)
* [Graph Agnostic, Oct 2018](https://github.com/ipfs/unixfs-v2/issues/18)

## Support for non-utf8 Filenames
 
[Link](https://github.com/ipfs/unixfs-v2/issues/3)

## Seeking in large directories

It’s often necessary to paginate through large directories and the current implementations do not easily support this.

**Question:** *Given that you can only paginate through a randomized ordering using the current sharding data structure, how useful would this be without ordered collections?*

* [Support seeking in large directories, Mar 2018](https://github.com/ipfs/unixfs-v2/issues/6)

## Symlinks

[Link](https://github.com/ipfs/unixfs-v2/pull/16)

## Protobuf Performance

While I’ve heard people say on numerous occations that `dag-pb` performance is an issue (compared to `dag-cbor`) I can‘t find any good links or resources to what the real impact of this is.


## Miscellaneous 

* [Size fields to keep or potentially remove](https://github.com/ipfs/unixfs-v2/issues/9)
* [Support for other hash linked filesystems](https://github.com/ipfs/unixfs-v2/issues/23)
* [Slicing chunks](https://github.com/ipfs/unixfs-v2/issues/25)
* [Comment: “Our plan is to switch to rabin (or similar), CIDv1, raw leaves, UnixFSv2 etc. all in one go.”](https://github.com/ipfs/go-ipfs-chunker/issues/13#issuecomment-482831094)
* [UnixFSv2 spike in IPLD Schema](https://github.com/ipld/specs/blob/1cf6e030576c684efba44bfb16485d873b68ac2b/design-history/exploration-reports/2019.06-unixfsv2-spike-01.md)

# Meeting Notes: August 8th 2019

- performance things
    - issues with old unixfs hamt
        - batching issues
        - fans out at the bottom way to fast
        - really deep tree even in cases that's unnecessary
- questions about external information we can feed into priorities
    - some other major user stories about high level apis have also come up...
        - it's hard to add directories to ipfs currently without re-scanning all files... incremental adds wanted
            - this is very much edge tooling and not unixfsv2 asks
    - "we took everything that was blocked on unixfsv2 off our q3 list"
        - doesn't mean we don't still want it, just choose to route elsewhere in other teams :)
        - ... additional comments about "these workaround are terrible"

- generation style versioning?
- more worried about changes to things like rabin chunking than anything else
    - moves (cancels dedup of) vast amounts of the data
    - changing metadata much lighter comparatively (still not free)
- some kinds of data might be easier to maintain read of and maybe that's useful?
    - e.g. concatenating all the bytes in a `[][]byte` is easy, even if chunker to write it changed

- worth mentioning that dir list order in most existing filesystems isn't... really specified.
    - you can't seek it -- there are not syscalls for that.


### anyone wanna talk about attribs?

https://gist.github.com/warpfork/3948bd951e93c0f0b4e355d78b736f83

- we should ping djdv on this as well

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UnixFS Reboot #28

TLDR;

UnixFS vNext Reboot

General Links

Development Targets

Improvements to UnixFSv1

UnixFSv2 on `dag-cbor` soonish

UnixFSv2 on “IPLD Future”

Issues

Standard File/Directory metadata

Links

Arbitrary file metadata

Reproducible Hashing

“Inline” files and directories

Support for non-utf8 Filenames

Seeking in large directories

Symlinks

Protobuf Performance

Miscellaneous

Meeting Notes: August 8th 2019

anyone wanna talk about attribs?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UnixFS Reboot #28

Description

TLDR;

UnixFS vNext Reboot

General Links

Development Targets

Improvements to UnixFSv1

UnixFSv2 on dag-cbor soonish

UnixFSv2 on “IPLD Future”

Issues

Standard File/Directory metadata

Links

Arbitrary file metadata

Reproducible Hashing

“Inline” files and directories

Support for non-utf8 Filenames

Seeking in large directories

Symlinks

Protobuf Performance

Miscellaneous

Meeting Notes: August 8th 2019

anyone wanna talk about attribs?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

UnixFSv2 on `dag-cbor` soonish