Skip to content

[Feature] Implement InternVL to llama.cpp #522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
James4Ever0 opened this issue Aug 20, 2024 · 22 comments
Open

[Feature] Implement InternVL to llama.cpp #522

James4Ever0 opened this issue Aug 20, 2024 · 22 comments

Comments

@James4Ever0
Copy link

Motivation

Many llama.cpp users are requesting this so far. Ollama is one of the interfaces of llama.cpp and it is quite popular. Implementing it will significantly accelerate InterVL adoption and recognition.

Related resources

ggml-org/llama.cpp#6803

Additional context

InternVL is based on LLaMA architecture. Currently text-only InternLM models have been ported to Ollama but not for multimodal ones.

@G-z-w
Copy link
Collaborator

G-z-w commented Aug 26, 2024

Thank you for your suggestions. We will gradually push support for various frameworks, and we also welcome contributions from the community.

@czczup czczup closed this as completed Sep 25, 2024
@James4Ever0
Copy link
Author

Hey @czczup would you please clarify the reason closing this issue?

@sammcj
Copy link

sammcj commented Sep 25, 2024

Any update on internVL support with llama.cpp?

1 similar comment
@cloudyuyuyu
Copy link

Any update on internVL support with llama.cpp?

@G-z-w
Copy link
Collaborator

G-z-w commented Sep 27, 2024

Thank you for your attention. We are actively progressing on this work, and we also welcome contributions from the community.

@rampageservices
Copy link

Just curious why this issue was closed if you are actively progressing on this work?

@czczup czczup reopened this Sep 30, 2024
@rampageservices
Copy link

rampageservices commented Oct 2, 2024 via email

@cloudyuyuyu
Copy link

any update?

@Cartomex-MX
Copy link

November 12, any update? thank in advance

@BleedingDev
Copy link

Is there anything that could be helped with? :) InternVL2.5 is really important for the future. :)

@linxhome
Copy link

linxhome commented Jan 6, 2025

pay close attention to this, any update? pls

@James4Ever0
Copy link
Author

James4Ever0 commented Jan 6, 2025

Is it really that hard to do this? I suggest myself to implement it.

Current progress: ggml-org/llama.cpp#9403

Also ipex-llm has support for InternVL2.

@G-z-w
Copy link
Collaborator

G-z-w commented Jan 6, 2025

Is it really that hard to do this? I suggest myself to implement it.

Thank you for your willingness to help! We greatly appreciate your initiative and would be glad to have your contributions. If you need us to provide any content or information, feel free to let us know.

@James4Ever0
Copy link
Author

James4Ever0 commented Jan 8, 2025

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

@G-z-w
Copy link
Collaborator

G-z-w commented Jan 8, 2025

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle.

If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

@James4Ever0
Copy link
Author

James4Ever0 commented Jan 14, 2025

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle.

If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

Is the model structure of v2.5 identical to v1.5 series? Now I can run v1.5 on llama.cpp. qlylangyu/llama.cpp#1

@G-z-w
Copy link
Collaborator

G-z-w commented Jan 14, 2025

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle.
If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

Is the model structure of v2.5 identical to v1.5 series? Now I can run v1.5 on llama.cpp.

Yes, the structure of v2.5 is identical to that of the v1.5 series, except that the v2.5 series use different language models.

@BakingBrains
Copy link

@James4Ever0 any luck in running v2.5 model?

Thank you

@gryffindor-rr
Copy link

I tried in llama.cpp today, still not supported yet:
python3 convert_hf_to_gguf.py model_path
ERROR:hf-to-gguf:Model InternVLChatModel is not supported

Any update for luck?

@MrZeros
Copy link

MrZeros commented Mar 3, 2025

any update?

@James4Ever0
Copy link
Author

James4Ever0 commented Mar 3, 2025

For anyone who is about to work with the current code, you could check my latest release here.

The 18 KB archive contains function-level diffs using universal ctags and some Python magic so anyone with entry-level C++ knowledge should be able to merge the changes easily.

@BrandonJull
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests