[BUG] <title>llama minicpmv-cli 有内存泄露问题？

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

### 该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

- [X] 我已经搜索过FAQ | I have searched FAQ

### 当前行为 | Current Behavior

llama-cpp-python 版本号 0.2.90
我的核心代码
# 获取量化模型

def get_model(mmp_model, Q_model):
    chat_handler = MiniCPMv26ChatHandler(clip_model_path=mmp_model, verbose=False)
    llm = Llama(
        n_gpu_layers=-1,
        model_path=Q_model,
        chat_handler=chat_handler,
        n_ctx=1024,
        #draft_model=True

    )
    return llm
#get model
 self.llm = get_model(settings.MMP_MODEL, settings.Q_MODEL)
#infer
 result = self.llm.create_chat_completion(
            max_tokens=20,
            stop=['。'],
            messages=msgs
        )
#########################################################
每次调用都会出现内存泄露，llama_chat_format.py 
![a6879c9732cf5cd576859e126b66faf](https://github.com/user-attachments/assets/322d6a7a-d75a-42cb-bf8a-7a98168060c4)
我手动释放image_emded 还是不行
后来，我使用llama.cpp  编译出 minicpmv-cli ,我更改部分代码，想执行多次，看是否有内存泄露问题，但是执行到100次左右出现问题：
![2c7e93035ffc03a54911f7f0cbf85da](https://github.com/user-attachments/assets/2ec837b1-11b0-473a-97a6-3179f6a3e346)


### 期望行为 | Expected Behavior

这是llama.cpp的问题吗？

### 复现方法 | Steps To Reproduce

_No response_

### 运行环境 | Environment

```Markdown
- OS:Ubuntu 20.04
- Python:3.10
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.2
```


### 备注 | Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] <title>llama minicpmv-cli 有内存泄露问题？ #703

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

获取量化模型

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] <title>llama minicpmv-cli 有内存泄露问题？ #703

Description

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

获取量化模型

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions