Confusing `finish_reason` when using `max_tokens` property in 'v1/chat/completions' endpoint

**LocalAI version:**

v2.20.1 `a9c521eb41dc2dd63769e5362f05d9ab5d8bec50`

**Environment, CPU architecture, OS, and Version:**
_OS:_ 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
_ENV:_ Docker version 26.0.1, build d260a54
_HW:_ i9-10900F, RTX3080, 128GB RAM

**Describe the bug**
When using the Endpoint 'v1/chat/completions' with the `max_tokens` parameter set to a specific value, the completion may be cut off, but the `finish_reason` remains `stop` instead of changing to `length`, making it difficult to determine if the answer is complete.

Additionally, when not using the `max_tokens` property, the response may still be cut off, but the `finish_reason` remains 'stop'.

**To Reproduce**
1. Send a request to the `v1/chat/completions` endpoint with the `max_tokens` property set to a specific value (e.g., 20).
2. Observe the response.

**Expected behavior**
When the `max_tokens` property is set, the response should clearly indicate if the completion is complete or not. If the completion is cut off, the `finish_reason` should be `length` instead of `stop`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Confusing `finish_reason` when using `max_tokens` property in 'v1/chat/completions' endpoint #3533

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Confusing finish_reason when using max_tokens property in 'v1/chat/completions' endpoint #3533

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Confusing `finish_reason` when using `max_tokens` property in 'v1/chat/completions' endpoint #3533