You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Path 2: Separate Query/Key/Value (standard format)
216
216
- Maintains full backward compatibility
217
217
218
+
## Critical Findings: Split GGUF Format Incompleteness
219
+
220
+
**Date:** 2025-11-08
221
+
**Status:** ⚠️ BLOCKER - Split GGUF format incomplete
222
+
223
+
### Analysis Summary
224
+
225
+
After implementing dual-backend loading and extensive debugging, we discovered that the split GGUF projector file for `hf.co/unsloth/Qwen3-VL-8B-Instruct-GGUF:Q4_K_M` is **incomplete**.
Location: ml/nn.(*LayerNorm).Forward at normalization.go:13
262
+
Cause: LayerNorm.Weight is nil
263
+
264
+
Call stack:
265
+
VisionEncoderLayer.Forward:96
266
+
→ e.Norm1.Forward(ctx, hiddenStates, opts.eps) // Norm1.Weight is nil
267
+
→ CRASH
268
+
```
269
+
270
+
**Root cause:**
271
+
1. Ollama's `populateFields()` tries to load `v.blk.N.norm1.weight`
272
+
2. Tensor doesn't exist in projector GGUF
273
+
3. Field remains `nil`
274
+
4. Forward pass crashes on first layer
275
+
276
+
### Why llama.cpp Succeeds
277
+
278
+
llama.cpp loads tensors with optional flag:
279
+
```cpp
280
+
layer.ln_1_w = get_tensor(string_format(TN_LN_1, prefix, il, "weight"), false);
281
+
// ^^^^^^
282
+
// optional=true
283
+
```
284
+
285
+
When tensor is missing:
286
+
- Returns `nullptr` without error
287
+
- Likely skips LayerNorm operation or uses identity
288
+
- Model continues without crash
289
+
290
+
### Comparison with llama.cpp Implementation
291
+
292
+
**llama.cpp approach (PR #16780):**
293
+
- Uses dual Conv2D (each 1152 channels)
294
+
- Complex spatial merge reshape operations
295
+
- Position embedding resized with bilinear interpolation
296
+
- LayerNorm and MLP loaded as **optional**
297
+
- Missing tensors handled gracefully with defaults
298
+
299
+
**Our approach:**
300
+
- Uses Conv3D with dual weights (384+384=768 channels)
301
+
- Simple padding to match expected dimensions
302
+
- Position embedding skipped when incompatible
303
+
- LayerNorm and MLP are **required** by struct definition
304
+
- Missing tensors cause nil pointer crashes
305
+
306
+
### Attempted Solutions
307
+
308
+
1.**Dual-backend tensor loading** ✅
309
+
- Successfully loads attention weights from projector
310
+
- Correctly falls back to main backend
311
+
- Works perfectly for available tensors
312
+
313
+
2.**Dynamic tensor creation** ❌
314
+
- Attempted to create identity LayerNorm/MLP
315
+
- Multiple compilation errors (no Ones/Zeros/Eye methods)
316
+
- Would require extensive GGML backend changes
317
+
318
+
3.**Optional struct fields** ❌
319
+
- Would break existing model loading
320
+
- Requires nil-checking throughout forward pass
321
+
- Significant architectural change
322
+
323
+
### Incompatibility Assessment
324
+
325
+
**This split GGUF format is fundamentally incompatible with Ollama's architecture:**
326
+
327
+
| Component | llama.cpp | Ollama | Compatible? |
328
+
|-----------|-----------|--------|-------------|
329
+
| Optional tensor loading | ✅ Yes | ❌ No | ❌ |
330
+
| Nil-safe forward pass | ✅ Yes | ❌ No | ❌ |
331
+
| LayerNorm required | ❌ No | ✅ Yes | ❌ |
332
+
| MLP required | ❌ No | ✅ Yes | ❌ |
333
+
| Struct-based loading | ❌ No | ✅ Yes | ❌ |
334
+
335
+
### Recommendations
336
+
337
+
**Option 1: Use non-split GGUF (RECOMMENDED)**
338
+
- Use standard single-file GGUF models
339
+
- All weights present in one file
340
+
- Full compatibility with Ollama
341
+
- No code changes needed
342
+
343
+
**Option 2: Complete the split GGUF**
344
+
- Add missing LayerNorm weights to projector
345
+
- Add missing MLP weights to projector
346
+
- Regenerate split GGUF with complete tensors
347
+
- This requires access to original model weights
348
+
349
+
**Option 3: Major Ollama refactor (NOT RECOMMENDED)**
350
+
- Implement optional tensor loading system
351
+
- Add nil-safe forward pass for all layers
352
+
- Make LayerNorm and MLP optional
353
+
- Extensive testing required
354
+
- High maintenance burden
355
+
- Significant architectural changes
356
+
357
+
### Conclusion
358
+
359
+
The split GGUF format as currently distributed is **incomplete and incompatible** with Ollama's model loading architecture. The projector file contains only attention weights, missing critical LayerNorm and MLP components that Ollama requires for inference.
360
+
361
+
**Status:** Cannot proceed without:
362
+
1. Complete split GGUF with all required tensors, OR
363
+
2. Standard non-split GGUF model, OR
364
+
3. Major Ollama architectural refactor (not recommended)
365
+
218
366
## Notes
219
367
220
368
- All code comments and documentation use English
221
369
- Changes are minimal and surgical to reduce maintenance burden
222
370
- Backward compatibility with standard models is mandatory
223
371
- Split GGUF support is additive, not replacing existing functionality
372
+
-**Split GGUF format incomplete - LayerNorm/MLP weights missing from projector**
0 commit comments