Skip to content

Commit 1ef1dd3

Browse files
committed
docs: Document critical split GGUF format incompleteness
Added comprehensive analysis of split GGUF blocker: - Listed all present tensors (attention, patch_embd, mergers) - Listed all missing tensors (LayerNorm, MLP) - Explained why Ollama crashes (nil pointer in Forward) - Explained why llama.cpp works (optional tensor loading) - Compared architectural approaches - Documented attempted solutions - Provided incompatibility assessment table - Listed 3 options: non-split GGUF, complete split, or major refactor - Concluded format is fundamentally incompatible with Ollama Status: Cannot proceed without complete tensors or architecture change.
1 parent 4b672a3 commit 1ef1dd3

File tree

1 file changed

+149
-0
lines changed

1 file changed

+149
-0
lines changed

Z_Iosu/docs/QWEN3VL_SPLIT_GGUF_IMPLEMENTATION.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,9 +215,158 @@ git checkout upstream/main -- ml/nn/convolution.go
215215
- Path 2: Separate Query/Key/Value (standard format)
216216
- Maintains full backward compatibility
217217

218+
## Critical Findings: Split GGUF Format Incompleteness
219+
220+
**Date:** 2025-11-08
221+
**Status:** ⚠️ BLOCKER - Split GGUF format incomplete
222+
223+
### Analysis Summary
224+
225+
After implementing dual-backend loading and extensive debugging, we discovered that the split GGUF projector file for `hf.co/unsloth/Qwen3-VL-8B-Instruct-GGUF:Q4_K_M` is **incomplete**.
226+
227+
### Tensors Present in Projector GGUF
228+
229+
**Attention weights:**
230+
- `v.blk.N.attn_qkv.weight` / `v.blk.N.attn_qkv.bias`
231+
- `v.blk.N.attn_out.weight` / `v.blk.N.attn_out.bias`
232+
233+
**Patch embedding:**
234+
- `v.patch_embd.weight` / `v.patch_embd.weight.1` / `v.patch_embd.bias`
235+
236+
**Position embedding (optional):**
237+
- `v.position_embd.weight`
238+
239+
**Post LayerNorm:**
240+
- `v.post_ln.weight` / `v.post_ln.bias`
241+
242+
**Mergers:**
243+
- `mm.0.weight` / `mm.0.bias` (FC1)
244+
- `mm.2.weight` / `mm.2.bias` (FC2)
245+
- `v.deepstack.N.*` (deepstack mergers for layers 8, 16, 24)
246+
247+
### Tensors MISSING from Projector GGUF
248+
249+
**Vision LayerNorm:**
250+
- `v.blk.N.norm1.weight` / `v.blk.N.norm1.bias`
251+
- `v.blk.N.norm2.weight` / `v.blk.N.norm2.bias`
252+
253+
**Vision MLP:**
254+
- `v.blk.N.mlp.linear_fc1.weight` / `v.blk.N.mlp.linear_fc1.bias`
255+
- `v.blk.N.mlp.linear_fc2.weight` / `v.blk.N.mlp.linear_fc2.bias`
256+
257+
### Why Ollama Fails
258+
259+
```
260+
Runtime Error: nil pointer dereference
261+
Location: ml/nn.(*LayerNorm).Forward at normalization.go:13
262+
Cause: LayerNorm.Weight is nil
263+
264+
Call stack:
265+
VisionEncoderLayer.Forward:96
266+
→ e.Norm1.Forward(ctx, hiddenStates, opts.eps) // Norm1.Weight is nil
267+
→ CRASH
268+
```
269+
270+
**Root cause:**
271+
1. Ollama's `populateFields()` tries to load `v.blk.N.norm1.weight`
272+
2. Tensor doesn't exist in projector GGUF
273+
3. Field remains `nil`
274+
4. Forward pass crashes on first layer
275+
276+
### Why llama.cpp Succeeds
277+
278+
llama.cpp loads tensors with optional flag:
279+
```cpp
280+
layer.ln_1_w = get_tensor(string_format(TN_LN_1, prefix, il, "weight"), false);
281+
// ^^^^^^
282+
// optional=true
283+
```
284+
285+
When tensor is missing:
286+
- Returns `nullptr` without error
287+
- Likely skips LayerNorm operation or uses identity
288+
- Model continues without crash
289+
290+
### Comparison with llama.cpp Implementation
291+
292+
**llama.cpp approach (PR #16780):**
293+
- Uses dual Conv2D (each 1152 channels)
294+
- Complex spatial merge reshape operations
295+
- Position embedding resized with bilinear interpolation
296+
- LayerNorm and MLP loaded as **optional**
297+
- Missing tensors handled gracefully with defaults
298+
299+
**Our approach:**
300+
- Uses Conv3D with dual weights (384+384=768 channels)
301+
- Simple padding to match expected dimensions
302+
- Position embedding skipped when incompatible
303+
- LayerNorm and MLP are **required** by struct definition
304+
- Missing tensors cause nil pointer crashes
305+
306+
### Attempted Solutions
307+
308+
1. **Dual-backend tensor loading**
309+
- Successfully loads attention weights from projector
310+
- Correctly falls back to main backend
311+
- Works perfectly for available tensors
312+
313+
2. **Dynamic tensor creation**
314+
- Attempted to create identity LayerNorm/MLP
315+
- Multiple compilation errors (no Ones/Zeros/Eye methods)
316+
- Would require extensive GGML backend changes
317+
318+
3. **Optional struct fields**
319+
- Would break existing model loading
320+
- Requires nil-checking throughout forward pass
321+
- Significant architectural change
322+
323+
### Incompatibility Assessment
324+
325+
**This split GGUF format is fundamentally incompatible with Ollama's architecture:**
326+
327+
| Component | llama.cpp | Ollama | Compatible? |
328+
|-----------|-----------|--------|-------------|
329+
| Optional tensor loading | ✅ Yes | ❌ No ||
330+
| Nil-safe forward pass | ✅ Yes | ❌ No ||
331+
| LayerNorm required | ❌ No | ✅ Yes ||
332+
| MLP required | ❌ No | ✅ Yes ||
333+
| Struct-based loading | ❌ No | ✅ Yes ||
334+
335+
### Recommendations
336+
337+
**Option 1: Use non-split GGUF (RECOMMENDED)**
338+
- Use standard single-file GGUF models
339+
- All weights present in one file
340+
- Full compatibility with Ollama
341+
- No code changes needed
342+
343+
**Option 2: Complete the split GGUF**
344+
- Add missing LayerNorm weights to projector
345+
- Add missing MLP weights to projector
346+
- Regenerate split GGUF with complete tensors
347+
- This requires access to original model weights
348+
349+
**Option 3: Major Ollama refactor (NOT RECOMMENDED)**
350+
- Implement optional tensor loading system
351+
- Add nil-safe forward pass for all layers
352+
- Make LayerNorm and MLP optional
353+
- Extensive testing required
354+
- High maintenance burden
355+
- Significant architectural changes
356+
357+
### Conclusion
358+
359+
The split GGUF format as currently distributed is **incomplete and incompatible** with Ollama's model loading architecture. The projector file contains only attention weights, missing critical LayerNorm and MLP components that Ollama requires for inference.
360+
361+
**Status:** Cannot proceed without:
362+
1. Complete split GGUF with all required tensors, OR
363+
2. Standard non-split GGUF model, OR
364+
3. Major Ollama architectural refactor (not recommended)
365+
218366
## Notes
219367

220368
- All code comments and documentation use English
221369
- Changes are minimal and surgical to reduce maintenance burden
222370
- Backward compatibility with standard models is mandatory
223371
- Split GGUF support is additive, not replacing existing functionality
372+
- **Split GGUF format incomplete - LayerNorm/MLP weights missing from projector**

0 commit comments

Comments
 (0)