Skip to content

ggml_new_tensor_impl: not enough space in the context's memory pool #585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jiapei100 opened this issue Aug 8, 2023 · 11 comments
Open
Labels
llama.cpp Problem with llama.cpp shared lib

Comments

@jiapei100
Copy link

I believe this is an llma-cpp-python issue, please refer to:

PromtEngineer/localGPT#349 (comment)

I'm trying to run localGPT with CUDA 12.2 .
but obtained this ERROR message:

......
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 18682928, available 10485760)

Does it have something to do with llama-cpp-python with CUDA 12.2 .

@ja14000
Copy link

ja14000 commented Aug 8, 2023

Try reducing max_ctx_size here to 512

Edit:
Correction you should actually change this, 512 is the default llama-cpp-python value, localgpt seems to be setting it equal to the context size (2048), which in your case requires more allocatable vram than is available. Disclaimer: I don't own a cuda device so this advice is based off of my research into the same issue on my M1 device, I don't know enough about the cuda implementation to say anything with certainty.

@taruntiwarihp
Copy link

@jiapei100 i'm also getting same error, I'm using CUDA 11.4

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 536950912, available 536870912)
Segmentation fault

@gjmulder gjmulder added the llama.cpp Problem with llama.cpp shared lib label Aug 9, 2023
@gjmulder
Copy link
Contributor

gjmulder commented Aug 9, 2023

This is a llama.cpp error, not a llama-cpp-python error.

@jiapei100
Copy link
Author

@ja14000

By changing max_ctx_size to 512 or 1024, I got the errors: ValueError: Requested tokens (1207) exceed context window of 1024

llama_tokenize_with_model: too many tokens
Traceback (most recent call last):
  File "....../localGPT/run_localGPT.py", line 272, in <module>
    main()
  File "~/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "~/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "~/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "~/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "....../localGPT/run_localGPT.py", line 250, in main
    res = qa(query)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
    raise e
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
    self._call(inputs, run_manager=run_manager)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py", line 140, in _call
    answer = self.combine_documents_chain.run(
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 456, in run
    return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
    raise e
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
    self._call(inputs, run_manager=run_manager)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 106, in _call
    output, extra_return_dict = self.combine_docs(
  File "~/.local/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 165, in combine_docs
    return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
  File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 252, in predict
    return self(kwargs, callbacks=callbacks)[self.output_key]
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
    raise e
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
    self._call(inputs, run_manager=run_manager)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 92, in _call
    response = self.generate([inputs], run_manager=run_manager)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 102, in generate
    return self.llm.generate_prompt(
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 455, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 586, in generate
    output = self._generate_helper(
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 492, in _generate_helper
    raise e
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 479, in _generate_helper
    self._generate(
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 965, in _generate
    self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
  File "~/.local/lib/python3.10/site-packages/langchain/llms/llamacpp.py", line 255, in _call
    for chunk in self._stream(
  File "~/.local/lib/python3.10/site-packages/langchain/llms/llamacpp.py", line 305, in _stream
    for part in result:
  File "~/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 900, in _create_completion
    raise ValueError(
ValueError: Requested tokens (1207) exceed context window of 1024

@jiapei100
Copy link
Author

jiapei100 commented Aug 9, 2023

@gjmulder

Do you know how to fix it? Where should I modify llama.cpp ?

@gjmulder
Copy link
Contributor

gjmulder commented Aug 9, 2023

Sorry, you've now stumbled upon another bug #462.

@bmtuan
Copy link

bmtuan commented Aug 9, 2023

I have same issue. I've set n_ctx = 2048 for my model but it doesn't work. Anyone has solution?

@ja14000
Copy link

ja14000 commented Aug 9, 2023

 Requested tokens (1207) exceed context window of 1024

From the error you posted, I'd say it's most likely this, if you've changed max_ctx_size , change this to match.

@longzoho
Copy link

longzoho commented Aug 15, 2023

Try with llama-cpp-python==0.1.74 instead of latest version 0.1.77 and it worked well

@943fansi
Copy link

Try with llama-cpp-python==0.1.74 instead of latest version 0.1.77 and it worked well

Thank you.

@rskvazh
Copy link

rskvazh commented Aug 25, 2023

Also got this error on all versions 0.1.76-0.1.78 (prompt 619 tokens, n_ctx=4096, n_batch=4096)
On 0.1.74 all works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llama.cpp Problem with llama.cpp shared lib
Projects
None yet
Development

No branches or pull requests

8 participants