Skip to content

Commit 1b9088d

Browse files
committed
docs: Add llama.cpp PR #16780 as reference implementation
Documented that we used llama.cpp PR #16780 as base reference: - Add Qwen3-VL support PR - URL: ggml-org/llama.cpp#16780 Analyzed for: - Dual Conv2D weight handling - Spatial merge reshape operations - Position embedding resizing - Optional tensor loading patterns - Qwen3-VL architectural details
1 parent 1ef1dd3 commit 1b9088d

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

Z_Iosu/docs/QWEN3VL_SPLIT_GGUF_IMPLEMENTATION.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,18 @@ Implement minimal changes to Ollama's upstream Qwen3-VL support to enable loadin
1313

1414
This commit represents the latest stable Ollama codebase with working Qwen3-VL support for standard models.
1515

16+
**Reference implementation:** llama.cpp PR #16780
17+
- **Title:** Add Qwen3-VL support
18+
- **URL:** https://github.com/ggml-org/llama.cpp/pull/16780
19+
- **Purpose:** Used as reference for understanding split GGUF architecture and dual Conv2D approach
20+
21+
The llama.cpp PR was analyzed to understand:
22+
- Dual Conv2D weight handling for temporal_patch_size=3
23+
- Spatial merge reshape operations
24+
- Position embedding resizing strategies
25+
- Optional tensor loading patterns
26+
- Qwen3-VL specific architectural details
27+
1628
## Problem Statement
1729

1830
The current Ollama implementation supports Qwen3-VL models in standard GGUF format where all vision model weights are in a single file. Split GGUF models distribute weights across multiple files with structural differences:

0 commit comments

Comments
 (0)