Models not working on version 0.1.69, fail with an exception: Requested tokens (85) exceed context window of 2048 #462

Kenshiro-28 · 2023-07-09T19:02:35Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

When I send a prompt to the model should receive a response (using version 0.1.69)

Current Behavior

It generates an exception:

"Requested tokens (85) exceed context window of 2048"

Environment and Context

Physical (or virtual) hardware you are using, e.g. for Linux:

AMD Ryzen 5 3600 6-Core Processor

Operating System, e.g. for Linux:

Debian 12

Linux 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.37-1 (2023-07-03) x86_64 GNU/Linux

SDK version, e.g. for Linux:

Python 3.11.2
GNU Make 4.3
g++ (Debian 12.2.0-14) 12.2.0

Failure Information (for bugs)

It generates an exception:

"Requested tokens (85) exceed context window of 2048"

Steps to Reproduce

Just call the model with a normal prompt:
...
MAX_TOKENS = 2048
...
model = Llama(model_path = modelFile, n_ctx = MAX_TOKENS)
...
response = model(text, max_tokens = MAX_TOKENS - text_tokens)

The code I'm using works fine in previous versions.

gjmulder · 2023-07-09T20:16:38Z

See #416

EDIT: @abetlen made a change to ensure no more tokens are requested than llama models support, namely 2048.

A better solution may be forthcoming, but it seems it still is in the design and testing stage upstream:

Extending context size via RoPE scaling.

Kenshiro-28 · 2023-07-09T20:23:51Z

I'm using models of 2048 tokens, requesting a context of the same size. It fails even with Wizard-Vicuna:

https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML

gjmulder · 2023-07-09T20:58:28Z

Sorry. My mistake. I was working on a branch and hadn't pulled in the latest updates.

Here's the PR #385

Kenshiro-28 · 2023-07-09T21:00:04Z

Ok, thx! :) will it be fixed in 0.1.70?

gjmulder · 2023-07-09T21:05:01Z

Not by me.

abetlen · 2023-07-09T22:00:10Z

@Kenshiro-28 yes I'll revert this back so that it just truncates max_tokens by default to the context length, this was change in a recent PR.

Kenshiro-28 · 2023-07-10T09:19:53Z

@abetlen great, thank you! :)

Kenshiro-28 changed the title ~~Requested tokens (85) exceed context window of 2048~~ Exception on version 0.1.69: Requested tokens (85) exceed context window of 2048 Jul 9, 2023

Kenshiro-28 changed the title ~~Exception on version 0.1.69: Requested tokens (85) exceed context window of 2048~~ Models not working on version 0.1.69, fail with an exception: Requested tokens (85) exceed context window of 2048 Jul 9, 2023

gjmulder added the llama.cpp Problem with llama.cpp shared lib label Jul 9, 2023

gjmulder added bug Something isn't working and removed llama.cpp Problem with llama.cpp shared lib labels Jul 9, 2023

Kenshiro-28 closed this as completed Jul 10, 2023

gjmulder mentioned this issue Aug 9, 2023

ggml_new_tensor_impl: not enough space in the context's memory pool #585

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Models not working on version 0.1.69, fail with an exception: Requested tokens (85) exceed context window of 2048 #462

Models not working on version 0.1.69, fail with an exception: Requested tokens (85) exceed context window of 2048 #462

Kenshiro-28 commented Jul 9, 2023 •

edited

Loading

gjmulder commented Jul 9, 2023 •

edited

Loading

Uh oh!

Kenshiro-28 commented Jul 9, 2023

Uh oh!

gjmulder commented Jul 9, 2023

Uh oh!

Kenshiro-28 commented Jul 9, 2023

Uh oh!

gjmulder commented Jul 9, 2023

Uh oh!

abetlen commented Jul 9, 2023

Uh oh!

Kenshiro-28 commented Jul 10, 2023

Uh oh!

Models not working on version 0.1.69, fail with an exception: Requested tokens (85) exceed context window of 2048 #462

Models not working on version 0.1.69, fail with an exception: Requested tokens (85) exceed context window of 2048 #462

Comments

Kenshiro-28 commented Jul 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

gjmulder commented Jul 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kenshiro-28 commented Jul 9, 2023

Uh oh!

gjmulder commented Jul 9, 2023

Uh oh!

Kenshiro-28 commented Jul 9, 2023

Uh oh!

gjmulder commented Jul 9, 2023

Uh oh!

abetlen commented Jul 9, 2023

Uh oh!

Kenshiro-28 commented Jul 10, 2023

Uh oh!

Kenshiro-28 commented Jul 9, 2023 •

edited

Loading

gjmulder commented Jul 9, 2023 •

edited

Loading