Skip to content

llava-cli segfault from heap corruption right on load on vc++ and on wsl #4693

@cmp-nct

Description

@cmp-nct

Adding it as bug report, not sure if the PR is still watched after merge.

Since PR #4205 there is a segfault on windows and on WSL using llava-cli (in clip.cpp)
It looks like a heap corruption that is triggered as soon as ctx0 in the build_graph is free()'d but already happened before (removing the free only shifts the segfault location.

  1. I tried two different standard llava models, both worked with the previous clip.
  2. I tried the precompiled release exe as well as compilations on WSL and on Windows
  3. I re-converted a llava model from start - just to make sure

Happens with CPU and GPU mode, maybe it's in the way memory buffer sizes are measured. That has significantly changed.
Seems related with the new backend buffer

Example command:
./bin/llava-cli -m /mnt/q/models/llava/liuhaotianllava-v1.5-7b/ggml-model-q3_k --mmproj /mnt/q/models/llava/liuhaotianllava-v1.5-7b/mmproj-model-f16.gguf --image /mnt/c/temp/tmp.png

clip_model_load: model name:   openai/clip-vit-large-patch14-336
clip_model_load: description:  image encoder for LLaVA
clip_model_load: GGUF version: 2
clip_model_load: alignment:    32
clip_model_load: n_tensors:    377
clip_model_load: n_kv:         18
clip_model_load: ftype:        f16

clip_model_load: CLIP using CPU backend
clip_model_load: text_encoder:   0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector:  1
clip_model_load: model size:     595.53 MB
clip_model_load: metadata size:  0.14 MB
clip_model_load: params backend buffer size =  595.53 MB (377 tensors)
Segmentation fault

Valgrind didn't show more than what I had seen, the free is causing an issue.

==4434== Warning: set address range perms: large range [0x5184040, 0x2a50c500) (undefined)
==4434== Invalid read of size 8
==4434==    at 0x217C15: ggml_allocr_alloc_graph (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==    by 0x14CA6A: clip_model_load (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==    by 0x11F39A: main (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==  Address 0x4ec98b8 is 72 bytes inside a block of size 884,880 free'd
==4434==    at 0x48399AB: free (vg_replace_malloc.c:538)
==4434==    by 0x1F2DC8: ggml_free (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==    by 0x126928: clip_image_build_graph(clip_ctx const*, clip_image_f32_batch const*) (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==    by 0x14CA5B: clip_model_load (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==    by 0x11F39A: main (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==  Block was alloc'd at
==4434==    at 0x483AEB8: memalign (vg_replace_malloc.c:906)
==4434==    by 0x483AFCE: posix_memalign (vg_replace_malloc.c:1070)
==4434==    by 0x1F29FB: ggml_init (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==    by 0x125D8C: clip_image_build_graph(clip_ctx const*, clip_image_f32_batch const*) (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==    by 0x14CA5B: clip_model_load (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)
==4434==    by 0x11F39A: main (in /mnt/q/vanilla/llama.cpp/build_linux/bin/llava-cli)

I'm a bit irritated given I can reproduce it on two "platforms" and with a clean rebuild but that's clearly something that has been present for like a week or two, so it does not appear to happen for everyone.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions