Add support for CogVLM model #15002

Tianyue-Zhao · 2025-08-01T01:50:42Z

This addresses the requests for CogVLM in #4387 and #4350.
CogVLM is a pretty popular model that now adds in cleanly after the recent additions to libmtmd.
I've converted a GGUF here: Link to GGUF files

Sample command and output:

build/bin/llama-mtmd-cli -m ../cogvlm-chat-hf/cogvlm-13B-chat-v1.1-F16.gguf --mmproj ../cogvlm-chat-hf/mmproj-cogvlm-chat-hf --image ./community.png --chat-template vicuna -p "Describe the picture"

load_hparams: model size:         8448.53 MiB
load_hparams: metadata size:      0.36 MiB
alloc_compute_meta:        CPU compute buffer size =   142.02 MiB
main: loading model: ../cogvlm-chat-hf/cogvlm-13B-chat-v1.1-F16.gguf
encoding image slice...
image slice encoded in 16135 ms
decoding image batch 1/1, n_tokens_batch = 1227
image decoded (batch 1/1) in 54065 ms

1. The image showcases a futuristic urban landscape with a mix of architectural styles. The buildings are multi-storied and have a combination of traditional and modern elements. There's a prominent tree in the foreground, suggesting a blend of nature and urban development. The scene appears to be bustling with activity, with various signs and billboards, indicating commercial or residential zones.


llama_perf_context_print:        load time =  108969.65 ms
llama_perf_context_print: prompt eval time =   85229.27 ms /  1241 tokens (   68.68 ms per token,    14.56 tokens per second)
llama_perf_context_print:        eval time =   19843.15 ms /    83 runs   (  239.07 ms per token,     4.18 tokens per second)
llama_perf_context_print:       total time =  126951.23 ms /  1324 tokens
llama_perf_context_print:    graphs reused =          0

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Tianyue-Zhao · 2025-08-02T16:47:30Z

I think I've fixed the typecheck and format check workflows that were failing before, can someone approve the workflows to run again?
Also, is there a way to run these Github workflows locally or without needing approval from a reviewer?
It would be good to run these CI/CD checks myself before posting the PR.

CISC · 2025-08-02T18:36:51Z

Also, is there a way to run these Github workflows locally or without needing approval from a reviewer? It would be good to run these CI/CD checks myself before posting the PR.

You can run flake8, pyright and editorconfig locally (or via IDE plugins), the build tests can be run manually with ctest.

CISC

This is not a complete review as I don't know enough about mtmd, just commenting...

convert_hf_to_gguf.py

src/llama-model.cpp

Tianyue-Zhao · 2025-08-06T00:12:09Z

Also, is there a way to run these Github workflows locally or without needing approval from a reviewer? It would be good to run these CI/CD checks myself before posting the PR.

You can run flake8, pyright and editorconfig locally (or via IDE plugins), the build tests can be run manually with ctest.

Thanks for the info! That's something I've been wondering about for a while.

src/llama-model.cpp

tools/mtmd/clip.cpp

CISC

Further refinement (merge cont+reshape).

src/llama-model.cpp

tools/mtmd/clip.cpp

CISC · 2025-08-29T11:19:10Z

Further refinement (merge cont+reshape).

After #15662 we can avoid these altogether and just create 3D views.

CISC · 2025-09-08T09:43:31Z

Further refinement (merge cont+reshape).

After #15662 we can avoid these altogether and just create 3D views.

Merged, rebase and apply updated suggestions.

Tianyue-Zhao · 2025-09-09T00:14:23Z

Further refinement (merge cont+reshape).

After #15662 we can avoid these altogether and just create 3D views.

Merged, rebase and apply updated suggestions.

Thanks for the reminder, I've rebased it and removed the extra ggml_cont calls.

ngxson · 2025-10-29T20:53:33Z

sorry I missed the notification to review this. will have a look & push commits to resolve the conflicts

ngxson

I don't have enough VRAM to test the model right now, but I think the code should be good to merge (after CI passed)

Feel free to give it a try even after the PR is merged. In case there are bugs, we can make follow-up PRs to fix it.

ngxson · 2025-10-30T09:55:56Z

No idea why the ASAN test failed, probably just a random runtime issue. I'm re-running the CI

CISC · 2025-10-30T10:00:31Z

No idea why the ASAN test failed, probably just a random runtime issue. I'm re-running the CI

It's ccache getting poisoned somehow, I've yet to track down the reason, when it happens you have to find and delete all the caches on the branch + master (I've just deleted the master ones) and rerun.

ngxson · 2025-10-30T23:25:31Z

btw @Tianyue-Zhao , it seems like this implementation still use the legacy llava preprocessing and does not support dynamic resolution. is this expected?

github-actions bot added examples python python script changes labels Aug 1, 2025

Tianyue-Zhao marked this pull request as ready for review August 1, 2025 02:15

Tianyue-Zhao force-pushed the cogvlm_support branch from 42113d1 to de22157 Compare August 2, 2025 16:45

CISC reviewed Aug 2, 2025

View reviewed changes

convert_hf_to_gguf.py Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

src/llama-model.cpp Outdated Show resolved Hide resolved

Tianyue-Zhao force-pushed the cogvlm_support branch from de22157 to a571d9a Compare August 8, 2025 00:35

CISC reviewed Aug 8, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

Tianyue-Zhao force-pushed the cogvlm_support branch from a571d9a to ac3992d Compare August 10, 2025 23:40

CISC reviewed Aug 18, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

Tianyue-Zhao added 16 commits September 8, 2025 23:48

Added GGUF mappings for CogVLM model

da81af4

Add tensor mapping for CogVLM visual encoder

6de9d16

Add CogVLM to conversion script, no vision part yet

11fac0c

Added CogVLM vision model to conversion script

a3adac1

Add graph for CogVLM CLIP model

3026781

Add graph for CogVLM

183ca2e

Fixes for CogVLM. Now compiles.

ac3e348

Model now runs

0634dc9

Fixes for cogvlm graph

4c080bf

Account for graph context change after rebase

bc3f084

Changes for whitespace

bcbd6ef

Changes in convert script according to comments

c65f5aa

Switch CogVLM LLM graph to merged QKV tensor

00af7ee

Use rope_type variable instead of direct definition

79e5640

Change CogVLM CLIP encoder to use SWIGLU

86a10bc

Switch CogVLM CLIP to use merged QKV

a959a1f

Apply rebase edits and remove ggml_cont call that is now unnecessary

06a0719

Tianyue-Zhao force-pushed the cogvlm_support branch from ac3992d to 06a0719 Compare September 9, 2025 00:13

CISC requested a review from ngxson September 9, 2025 07:59

CISC mentioned this pull request Oct 29, 2025

[model] add support for qwen3vl series #16780

Merged

ngxson added 2 commits October 29, 2025 23:59

Merge branch 'master' into cogvlm_support

f5cb848

clean up

85764bd

ngxson approved these changes Oct 29, 2025

View reviewed changes

ngxson merged commit bacddc0 into ggml-org:master Oct 30, 2025
189 of 192 checks passed

Add support for CogVLM model #15002

Add support for CogVLM model #15002

Uh oh!

Conversation

Tianyue-Zhao commented Aug 1, 2025

Uh oh!

Tianyue-Zhao commented Aug 2, 2025

Uh oh!

CISC commented Aug 2, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Tianyue-Zhao commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CISC commented Aug 29, 2025

Uh oh!

CISC commented Sep 8, 2025

Uh oh!

Tianyue-Zhao commented Sep 9, 2025

Uh oh!

ngxson commented Oct 29, 2025

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson commented Oct 30, 2025

Uh oh!

CISC commented Oct 30, 2025

Uh oh!

Uh oh!

ngxson commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants