Skip to content

Content Type set by HTTP Gateway #152

Open
@lidel

Description

@lidel

HTTP Gateway does content-type sniffing based on golang.org/src/net/http/sniff.go and file extension. js-ipfs uses similar setup.

Problem: there is no mechanism for website creator to override returned content-type, setting custom file extension works only for some file types.

Example

The same data produces different content-type, depending on request path.

SVG image

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.svg
→ returned as image/svg+xml

XML document

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.xml
→ returned as text/xml

Unknown extension

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.foo
→ returned as text/plain

Raw CID

https://ipfs.io/ipfs/QmTqZhR6f7jzdhLgPArDPnsbZpvvgxzCZycXK7ywkLxSyU
→ returned as text/plain

Raw CID + explicit filename

https://ipfs.io/ipfs/QmTqZhR6f7jzdhLgPArDPnsbZpvvgxzCZycXK7ywkLxSyU?filename=/ipfs-logo.svg
→ returned as image/svg+xml

Motivation

We want IPFS to become viable solution for hosting websites.
At the HTTP level, as a bare minimum, website owners expect to able to override:

  • content-type of specific files / file types
  • error pages (4xx, 5xx)

Ideas to explore

(A) Embedding content-type in DAG-PB (UnixFS metadata)

One way to address this is to support embedding Content-Type in UnixFS DAG metadata.
It would be opt-in (like mode and mtime).
TBD if filename should override content type embedded in the dag.

This is tracked in ipfs/specs#364

(B) Drop-in config to override content-type per directory

@warpfork noted that DAG metadata may not be the best place for storing content-type:

ipfs/specs#217 (comment)
+1 towards the idea that if [Content] type is getting well-known support, it should be something we move towards the gateway knowing of it, rather than making it a feature of the filesystem.

This would be a much closer set of relationships to how the rest of the world works already (e.g. doing sysadmin today with nginx or something, I would generally configure [Content] types at the webserver area, and not in filesystem metadata) -- and thus seems much less likely to go awry.

Carefully avoiding baking in the idea of a single "mimetype string" field into our filesystem metadata also leaves much more room for issues to evolve around the things Ian mentioned:

  1. a file can have multiple mime types depending on the context
  2. some mime types can't be deduced until the entire file has been read

My take on this is:

  • mind, we did exactly the opposite with mtime and mode – UnixFS 1.5 embedds them in dag-pb
  • we could support both ways. e.g., website creator would add something like _headers to the directory, and Gateway would do the right thing when resource from directory or its subdirectories are requested
    • presence of the config file would disable content sniffing on both server and client (X-Content-Type-Options: nosniff)

See _headers in ipfs/specs#257

References

cc @olizilla @autonome

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions