Skip to content

Unicode/UTF-8 Character Handling Issue in REST API "/v1/chat/completions" Endpoint #804

@wanderingmeow

Description

@wanderingmeow

🐛 Bug

Unicode/UTF-8 Character Handling Issue in REST API "/v1/chat/completions" Endpoint

To Reproduce

Steps to reproduce the behavior:

  1. Spawn a REST API server with a model that supports outputting CJK characters or emoji.
> curl --data '{"model":"","messages":[{"role":"user","content":"Introduce yourself with lots of emojis"}],"stream":true}' --header 'Content-Type: application/json' http://127.0.0.1:8000/v1/chat/completions
  1. output:
data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"Hello!"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"�"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"�"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"�"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"�"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"�"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"�"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"'"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"m"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" just"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" an"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" A"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"I"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" assistant"},"finish_reason":"stop"}]}

...

Expected behavior

Expected output:

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"Hello!"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"😊"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"👋"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"'"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"m"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" just"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" an"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" A"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"I"},"finish_reason":"stop"}]}

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":" assistant"},"finish_reason":"stop"}]}

...

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Metal
  • Operating system (e.g. Ubuntu/Windows/MacOS/...): macOS
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...)
  • How you installed MLC-LLM (conda, source): source
  • How you installed TVM-Unity (pip, source): pip
  • Python version (e.g. 3.10): 3.11
  • GPU driver version (if applicable):
  • CUDA/cuDNN version (if applicable):
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
  • Any other relevant information:

Potential fix

prev_txt = ""
async for content in AsyncChatCompletionStream():
if content:
chunk = ChatCompletionStreamResponse(
choices=[
ChatCompletionResponseStreamChoice(
index=0,
delta=DeltaMessage(
role="assistant", content=content[len(prev_txt) :]
),
finish_reason="stop",
)
]
)
prev_txt = content

prev_txt = ""
async for content in AsyncChatCompletionStream():
    if content:
        valid_content = content.replace('�', '')
        chunk = ChatCompletionStreamResponse(
            choices=[
                ChatCompletionResponseStreamChoice(
                    index=0,
                    delta=DeltaMessage(
                        role="assistant", content=valid_content[len(prev_txt):]
                    ),
                    finish_reason="stop",
                )
            ]
        )
        prev_txt = valid_content

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions