Description
HTTP Gateway does content-type sniffing based on golang.org/src/net/http/sniff.go and file extension. js-ipfs uses similar setup.
Problem: there is no mechanism for website creator to override returned content-type, setting custom file extension works only for some file types.
Example
The same data produces different content-type, depending on request path.
SVG image
https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.svg
→ returned as image/svg+xml
XML document
https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.xml
→ returned as text/xml
Unknown extension
https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.foo
→ returned as text/plain
Raw CID
https://ipfs.io/ipfs/QmTqZhR6f7jzdhLgPArDPnsbZpvvgxzCZycXK7ywkLxSyU
→ returned as text/plain
Raw CID + explicit filename
https://ipfs.io/ipfs/QmTqZhR6f7jzdhLgPArDPnsbZpvvgxzCZycXK7ywkLxSyU?filename=/ipfs-logo.svg
→ returned as image/svg+xml
Motivation
We want IPFS to become viable solution for hosting websites.
At the HTTP level, as a bare minimum, website owners expect to able to override:
- content-type of specific files / file types
- error pages (4xx, 5xx)
Ideas to explore
(A) Embedding content-type in DAG-PB (UnixFS metadata)
One way to address this is to support embedding Content-Type in UnixFS DAG metadata.
It would be opt-in (like mode
and mtime
).
TBD if filename
should override content type embedded in the dag.
This is tracked in ipfs/specs#364
(B) Drop-in config to override content-type per directory
@warpfork noted that DAG metadata may not be the best place for storing content-type:
ipfs/specs#217 (comment)
+1 towards the idea that if [Content] type is getting well-known support, it should be something we move towards the gateway knowing of it, rather than making it a feature of the filesystem.This would be a much closer set of relationships to how the rest of the world works already (e.g. doing sysadmin today with nginx or something, I would generally configure [Content] types at the webserver area, and not in filesystem metadata) -- and thus seems much less likely to go awry.
Carefully avoiding baking in the idea of a single "mimetype string" field into our filesystem metadata also leaves much more room for issues to evolve around the things Ian mentioned:
- a file can have multiple mime types depending on the context
- some mime types can't be deduced until the entire file has been read
My take on this is:
- mind, we did exactly the opposite with
mtime
andmode
– UnixFS 1.5 embedds them in dag-pb - we could support both ways. e.g., website creator would add something like
_headers
to the directory, and Gateway would do the right thing when resource from directory or its subdirectories are requested- presence of the config file would disable content sniffing on both server and client (
X-Content-Type-Options: nosniff
)
- presence of the config file would disable content sniffing on both server and client (
See _headers
in ipfs/specs#257
References
- Storing Explicit Content Type: Storing Explicit Content Type ipld/legacy-unixfs-v2#11
- SVG files being sniffed incorrectly (Gateway content-type response header ipfs-inactive/faq#224 (comment), fix: fix content-type by doing a fall-back using extensions js-ipfs-http-response#5, fix: fix content-type detection by doing a fall-back based on the ext… js-ipfs#1482)
- prior art for drop-in config that travels with data