Skip to content

Added magic for file types #3011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

jboero
Copy link
Contributor

@jboero jboero commented Sep 4, 2023

Add this to your /etc/magic file to enable the file command.

jboero@xps ~/Downloads> file *llama*
codellama-7b.Q8_0.gguf:     GGUF LLM model version=1
llama-2-7b.ggmlv3.q8_0.bin: GGML/GGJT LLM model version=3
llama-cpp.srpm.spec:        ASCII text

@KerfuffleV2
Copy link
Collaborator

I don't think this would go in grammers/, that directory is for the grammar sampling stuff.

file is a Unix tool, you probably shouldn't just add a config file for it with no explanation.

Also, your file configuration here doesn't actually have enough information to answer my question in #2990 about what type of GGML file you actually have. Knowing it's some kind of GGML/GGJT/GGMF file of a version isn't specific enough.

Also, at least on Linux file supports -m which lets you set a config file with info for these types. No need to added it to the system wide /etc/magic


I think if you wanted to add this it would make more sense to create a document in docs/ or something where you can show the example and include some information so people know how to use it and why they'd want to.

@jboero
Copy link
Contributor Author

jboero commented Sep 4, 2023

Okay I can do that until it gets into upstream magic. I'm also working in a branch with a dedicated tool to give more details about the file contents but this was low hanging fruit for now.

@jboero
Copy link
Contributor Author

jboero commented Sep 4, 2023

İs there an official mime type for models?

@staviq
Copy link
Contributor

staviq commented Sep 4, 2023

İs there an official mime type for models?

Funnily enough, there is an official "model" mime category, so a proper mime type (apparently it's deprecated name and now that are called media types, instead of mime types) would be

model/ggml
model/gguf

Registering new types "officially" seems to be done through an individual review by IANA.

This article links registration procedures near the very top of the page: https://www.iana.org/assignments/media-types/media-types.xhtml#model

From what I can tell, a media type, can include identifying magic.

Edit: So apparently, the "Linux" way of adding your own mime type is through XML files, added to one of $XDG_DATA_DIRS and followed by update-mime-database. This article explains it a bit better, it's like 10 years old, but it does appear at least Arch still does it this way: https://www.freedesktop.org/wiki/Specifications/AddingMIMETutor/

And I think considering the above, this should probably be moved to *.srpm.spec and/or other places directly involved in packaging llamacpp.

I don't think direct edits of Linux mime database are supported anywhere, as update-mime-database appears to be overriding it.

@jboero
Copy link
Contributor Author

jboero commented Sep 5, 2023

Yes I had planned to add to the RPM specs but will move it to docs for now and I need to craft it carefully enough to support the latest Fedora and the oldest *EL. Testers appreciated.

Funny mixing LLM models with 3D models in the model/* category.

@jboero jboero closed this Sep 5, 2023
@jboero jboero mentioned this pull request Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants