-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Models not working on version 0.1.69, fail with an exception: Requested tokens (85) exceed context window of 2048 #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm using models of 2048 tokens, requesting a context of the same size. It fails even with Wizard-Vicuna: https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML |
Sorry. My mistake. I was working on a branch and hadn't pulled in the latest updates. Here's the PR #385 |
Ok, thx! :) will it be fixed in 0.1.70? |
Not by me. |
@Kenshiro-28 yes I'll revert this back so that it just truncates max_tokens by default to the context length, this was change in a recent PR. |
@abetlen great, thank you! :) |
Uh oh!
There was an error while loading. Please reload this page.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
When I send a prompt to the model should receive a response (using version 0.1.69)
Current Behavior
It generates an exception:
"Requested tokens (85) exceed context window of 2048"
Environment and Context
AMD Ryzen 5 3600 6-Core Processor
Debian 12
Linux 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.37-1 (2023-07-03) x86_64 GNU/Linux
Python 3.11.2
GNU Make 4.3
g++ (Debian 12.2.0-14) 12.2.0
Failure Information (for bugs)
It generates an exception:
"Requested tokens (85) exceed context window of 2048"
Steps to Reproduce
...
MAX_TOKENS = 2048
...
model = Llama(model_path = modelFile, n_ctx = MAX_TOKENS)
...
response = model(text, max_tokens = MAX_TOKENS - text_tokens)
The code I'm using works fine in previous versions.
The text was updated successfully, but these errors were encountered: