-
Notifications
You must be signed in to change notification settings - Fork 1.1k
ggml_new_tensor_impl: not enough space in the context's memory pool #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Try reducing Edit: |
@jiapei100 i'm also getting same error, I'm using CUDA 11.4 ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 536950912, available 536870912)
Segmentation fault |
This is a llama.cpp error, not a |
By changing llama_tokenize_with_model: too many tokens
Traceback (most recent call last):
File "....../localGPT/run_localGPT.py", line 272, in <module>
main()
File "~/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "~/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "~/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "~/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "....../localGPT/run_localGPT.py", line 250, in main
res = qa(query)
File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
raise e
File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
self._call(inputs, run_manager=run_manager)
File "~/.local/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py", line 140, in _call
answer = self.combine_documents_chain.run(
File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 456, in run
return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
raise e
File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
self._call(inputs, run_manager=run_manager)
File "~/.local/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 106, in _call
output, extra_return_dict = self.combine_docs(
File "~/.local/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 165, in combine_docs
return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 252, in predict
return self(kwargs, callbacks=callbacks)[self.output_key]
File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
raise e
File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
self._call(inputs, run_manager=run_manager)
File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 92, in _call
response = self.generate([inputs], run_manager=run_manager)
File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 102, in generate
return self.llm.generate_prompt(
File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 455, in generate_prompt
return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 586, in generate
output = self._generate_helper(
File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 492, in _generate_helper
raise e
File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 479, in _generate_helper
self._generate(
File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 965, in _generate
self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
File "~/.local/lib/python3.10/site-packages/langchain/llms/llamacpp.py", line 255, in _call
for chunk in self._stream(
File "~/.local/lib/python3.10/site-packages/langchain/llms/llamacpp.py", line 305, in _stream
for part in result:
File "~/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 900, in _create_completion
raise ValueError(
ValueError: Requested tokens (1207) exceed context window of 1024 |
Sorry, you've now stumbled upon another bug #462. |
I have same issue. I've set |
From the error you posted, I'd say it's most likely this, if you've changed |
Try with llama-cpp-python==0.1.74 instead of latest version 0.1.77 and it worked well |
Thank you. |
Also got this error on all versions 0.1.76-0.1.78 (prompt 619 tokens, n_ctx=4096, n_batch=4096) |
I believe this is an llma-cpp-python issue, please refer to:
PromtEngineer/localGPT#349 (comment)
I'm trying to run localGPT with CUDA 12.2 .
but obtained this ERROR message:
Does it have something to do with llama-cpp-python with CUDA 12.2 .
The text was updated successfully, but these errors were encountered: