Skip to content

Confusing finish_reason when using max_tokens property in 'v1/chat/completions' endpoint #3533

@daJuels

Description

@daJuels

LocalAI version:

v2.20.1 a9c521eb41dc2dd63769e5362f05d9ab5d8bec50

Environment, CPU architecture, OS, and Version:
OS: 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
ENV: Docker version 26.0.1, build d260a54
HW: i9-10900F, RTX3080, 128GB RAM

Describe the bug
When using the Endpoint 'v1/chat/completions' with the max_tokens parameter set to a specific value, the completion may be cut off, but the finish_reason remains stop instead of changing to length, making it difficult to determine if the answer is complete.

Additionally, when not using the max_tokens property, the response may still be cut off, but the finish_reason remains 'stop'.

To Reproduce

  1. Send a request to the v1/chat/completions endpoint with the max_tokens property set to a specific value (e.g., 20).
  2. Observe the response.

Expected behavior
When the max_tokens property is set, the response should clearly indicate if the completion is complete or not. If the completion is cut off, the finish_reason should be length instead of stop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions