ggml_new_tensor_impl: not enough space in the context's memory pool #585

jiapei100 · 2023-08-08T05:10:35Z

I believe this is an llma-cpp-python issue, please refer to:

I'm trying to run localGPT with CUDA 12.2 .
but obtained this ERROR message:

......
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 18682928, available 10485760)

Does it have something to do with llama-cpp-python with CUDA 12.2 .

ja14000 · 2023-08-08T09:53:13Z

Try reducing max_ctx_size here to 512

Edit:
Correction you should actually change this, 512 is the default llama-cpp-python value, localgpt seems to be setting it equal to the context size (2048), which in your case requires more allocatable vram than is available. Disclaimer: I don't own a cuda device so this advice is based off of my research into the same issue on my M1 device, I don't know enough about the cuda implementation to say anything with certainty.

taruntiwarihp · 2023-08-08T19:02:19Z

@jiapei100 i'm also getting same error, I'm using CUDA 11.4

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 536950912, available 536870912)
Segmentation fault

gjmulder · 2023-08-09T06:21:28Z

This is a llama.cpp error, not a llama-cpp-python error.

jiapei100 · 2023-08-09T06:28:40Z

@ja14000

By changing max_ctx_size to 512 or 1024, I got the errors: ValueError: Requested tokens (1207) exceed context window of 1024

llama_tokenize_with_model: too many tokens
Traceback (most recent call last):
  File "....../localGPT/run_localGPT.py", line 272, in <module>
    main()
  File "~/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "~/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "~/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "~/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "....../localGPT/run_localGPT.py", line 250, in main
    res = qa(query)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
    raise e
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
    self._call(inputs, run_manager=run_manager)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py", line 140, in _call
    answer = self.combine_documents_chain.run(
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 456, in run
    return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
    raise e
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
    self._call(inputs, run_manager=run_manager)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 106, in _call
    output, extra_return_dict = self.combine_docs(
  File "~/.local/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 165, in combine_docs
    return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
  File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 252, in predict
    return self(kwargs, callbacks=callbacks)[self.output_key]
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 258, in __call__
    raise e
  File "~/.local/lib/python3.10/site-packages/langchain/chains/base.py", line 252, in __call__
    self._call(inputs, run_manager=run_manager)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 92, in _call
    response = self.generate([inputs], run_manager=run_manager)
  File "~/.local/lib/python3.10/site-packages/langchain/chains/llm.py", line 102, in generate
    return self.llm.generate_prompt(
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 455, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 586, in generate
    output = self._generate_helper(
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 492, in _generate_helper
    raise e
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 479, in _generate_helper
    self._generate(
  File "~/.local/lib/python3.10/site-packages/langchain/llms/base.py", line 965, in _generate
    self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
  File "~/.local/lib/python3.10/site-packages/langchain/llms/llamacpp.py", line 255, in _call
    for chunk in self._stream(
  File "~/.local/lib/python3.10/site-packages/langchain/llms/llamacpp.py", line 305, in _stream
    for part in result:
  File "~/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 900, in _create_completion
    raise ValueError(
ValueError: Requested tokens (1207) exceed context window of 1024

jiapei100 · 2023-08-09T06:30:29Z

@gjmulder

Do you know how to fix it? Where should I modify llama.cpp ?

gjmulder · 2023-08-09T06:41:25Z

Sorry, you've now stumbled upon another bug #462.

bmtuan · 2023-08-09T09:19:35Z

I have same issue. I've set n_ctx = 2048 for my model but it doesn't work. Anyone has solution?

ja14000 · 2023-08-09T14:41:29Z

 Requested tokens (1207) exceed context window of 1024

From the error you posted, I'd say it's most likely this, if you've changed max_ctx_size , change this to match.

longzoho · 2023-08-15T05:15:52Z

Try with llama-cpp-python==0.1.74 instead of latest version 0.1.77 and it worked well

943fansi · 2023-08-15T14:05:11Z

Try with llama-cpp-python==0.1.74 instead of latest version 0.1.77 and it worked well

Thank you.

rskvazh · 2023-08-25T12:13:34Z

Also got this error on all versions 0.1.76-0.1.78 (prompt 619 tokens, n_ctx=4096, n_batch=4096)
On 0.1.74 all works fine.

gjmulder added the llama.cpp Problem with llama.cpp shared lib label Aug 9, 2023

gabrielpondc mentioned this issue Jun 14, 2024

使用openai_api.py启动量化后的glm4模型报错 li-plus/chatglm.cpp#313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml_new_tensor_impl: not enough space in the context's memory pool #585

ggml_new_tensor_impl: not enough space in the context's memory pool #585

jiapei100 commented Aug 8, 2023

ja14000 commented Aug 8, 2023 •

edited

Loading

Uh oh!

taruntiwarihp commented Aug 8, 2023

Uh oh!

gjmulder commented Aug 9, 2023

Uh oh!

jiapei100 commented Aug 9, 2023

Uh oh!

jiapei100 commented Aug 9, 2023 •

edited

Loading

Uh oh!

gjmulder commented Aug 9, 2023

Uh oh!

bmtuan commented Aug 9, 2023 •

edited

Loading

Uh oh!

ja14000 commented Aug 9, 2023

Uh oh!

longzoho commented Aug 15, 2023 •

edited

Loading

Uh oh!

943fansi commented Aug 15, 2023

Uh oh!

rskvazh commented Aug 25, 2023

Uh oh!

ggml_new_tensor_impl: not enough space in the context's memory pool #585

ggml_new_tensor_impl: not enough space in the context's memory pool #585

Comments

jiapei100 commented Aug 8, 2023

ja14000 commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taruntiwarihp commented Aug 8, 2023

Uh oh!

gjmulder commented Aug 9, 2023

Uh oh!

jiapei100 commented Aug 9, 2023

Uh oh!

jiapei100 commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gjmulder commented Aug 9, 2023

Uh oh!

bmtuan commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ja14000 commented Aug 9, 2023

Uh oh!

longzoho commented Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

943fansi commented Aug 15, 2023

Uh oh!

rskvazh commented Aug 25, 2023

Uh oh!

ja14000 commented Aug 8, 2023 •

edited

Loading

jiapei100 commented Aug 9, 2023 •

edited

Loading

bmtuan commented Aug 9, 2023 •

edited

Loading

longzoho commented Aug 15, 2023 •

edited

Loading