-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Description
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
llama-cpp-python 版本号 0.2.90
我的核心代码
获取量化模型
def get_model(mmp_model, Q_model):
chat_handler = MiniCPMv26ChatHandler(clip_model_path=mmp_model, verbose=False)
llm = Llama(
n_gpu_layers=-1,
model_path=Q_model,
chat_handler=chat_handler,
n_ctx=1024,
#draft_model=True
)
return llm
#get model
self.llm = get_model(settings.MMP_MODEL, settings.Q_MODEL)
#infer
result = self.llm.create_chat_completion(
max_tokens=20,
stop=['。'],
messages=msgs
)
#########################################################
每次调用都会出现内存泄露,llama_chat_format.py
我手动释放image_emded 还是不行
后来,我使用llama.cpp 编译出 minicpmv-cli ,我更改部分代码,想执行多次,看是否有内存泄露问题,但是执行到100次左右出现问题:
期望行为 | Expected Behavior
这是llama.cpp的问题吗?
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
- OS:Ubuntu 20.04
- Python:3.10
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.2
备注 | Anything else?
No response
Metadata
Metadata
Assignees
Labels
No labels