-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
LocalAI version:
v2.20.1 a9c521eb41dc2dd63769e5362f05d9ab5d8bec50
Environment, CPU architecture, OS, and Version:
OS: 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
ENV: Docker version 26.0.1, build d260a54
HW: i9-10900F, RTX3080, 128GB RAM
Describe the bug
When using the Endpoint 'v1/chat/completions' with the max_tokens
parameter set to a specific value, the completion may be cut off, but the finish_reason
remains stop
instead of changing to length
, making it difficult to determine if the answer is complete.
Additionally, when not using the max_tokens
property, the response may still be cut off, but the finish_reason
remains 'stop'.
To Reproduce
- Send a request to the
v1/chat/completions
endpoint with themax_tokens
property set to a specific value (e.g., 20). - Observe the response.
Expected behavior
When the max_tokens
property is set, the response should clearly indicate if the completion is complete or not. If the completion is cut off, the finish_reason
should be length
instead of stop
.